unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* Chinese characters support
@ 2003-05-07 23:08 Gaoyan Xie
  2003-05-08  6:27 ` Charles Muller
       [not found] ` <mailman.5739.1052375326.21513.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 41+ messages in thread
From: Gaoyan Xie @ 2003-05-07 23:08 UTC (permalink / raw)


Hi all,

I am trying to explore GNU emacs's multilingual support, and what I want 
is the display and input of Chinese characters. Have any of you done 
this before? I tried according to GNU emacs' online manual, but still 
couldn't make it work. BTW, I am using Redhat Linux 7.2 and GNU emacs 20.7.

Thanks for any help for this issue.

Gaoyan Xie

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
  2003-05-07 23:08 Gaoyan Xie
@ 2003-05-08  6:27 ` Charles Muller
       [not found] ` <mailman.5739.1052375326.21513.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 41+ messages in thread
From: Charles Muller @ 2003-05-08  6:27 UTC (permalink / raw)
  Cc: help-gnu-emacs

Gaoyan Xie wrote

> I am trying to explore GNU emacs's multilingual support, and what I want 
> is the display and input of Chinese characters. Have any of you done 
> this before? I tried according to GNU emacs' online manual, but still 
> couldn't make it work. BTW, I am using Redhat Linux 7.2 and GNU emacs
> 20.7.

I would recommend first that you consider installing a newer 21.x version of Emacs
if you are concerned about international script support. A newer version of
RedHat would not hurt either. I am using RH9 with Emacs 21.2 and Chinese and
Japanese display without me having to do anything, as long as the documents
are encoded in JIS for Japanese and Big5 for Chinese.

When it comes to working with UTF-8, I have never heard of anyone succeeding in
displaying East Asian scripts without installing the TEI-Emacs add-on.

Chuck

---------------------------
Charles Muller  <acmuller@gol.com>
Faculty of Humanities,  Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary [http://www.acmuller.net]
H-Buddhism List Editor [http://www2.h-net.msu.edu/~buddhism/]
Mobile Phone: 090-9310-1787

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
       [not found] ` <mailman.5739.1052375326.21513.help-gnu-emacs@gnu.org>
@ 2003-05-08  7:33   ` Robin Hu
  2003-05-10 14:28   ` Kai Großjohann
  1 sibling, 0 replies; 41+ messages in thread
From: Robin Hu @ 2003-05-08  7:33 UTC (permalink / raw)


>>>>> "Charles" == Charles Muller <acmuller@gol.com> writes:

    Charles> Gaoyan Xie wrote
    >> I am trying to explore GNU emacs's multilingual support, and what
    >> I want is the display and input of Chinese characters. Have any
    >> of you done this before? I tried according to GNU emacs' online
    >> manual, but still couldn't make it work. BTW, I am using Redhat
    >> Linux 7.2 and GNU emacs 20.7.

    Charles> I would recommend first that you consider installing a
    Charles> newer 21.x version of Emacs if you are concerned about
    Charles> international script support. A newer version of RedHat
    Charles> would not hurt either. I am using RH9 with Emacs 21.2 and
    Charles> Chinese and Japanese display without me having to do
    Charles> anything, as long as the documents are encoded in JIS for
    Charles> Japanese and Big5 for Chinese.

    Charles> When it comes to working with UTF-8, I have never heard of
    Charles> anyone succeeding in displaying East Asian scripts without
    Charles> installing the TEI-Emacs add-on.

    I am using Mule-Ucs 0.84 (patches from debian applied) with Emacs
    21.3.50, it seems to work fine. So what is "TEI-Emacs add-on", can
    you point me out URLs related?
    Charles> Chuck


-- 
The goal of science is to build better mousetraps.  The goal of nature
is to build better mice.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
       [not found] <mailman.5730.1052348993.21513.help-gnu-emacs@gnu.org>
@ 2003-05-10 14:26 ` Kai Großjohann
  2003-05-10 16:17   ` Charles Muller
                     ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Kai Großjohann @ 2003-05-10 14:26 UTC (permalink / raw)


Gaoyan Xie <gxie@eecs.wsu.edu> writes:

> I am trying to explore GNU emacs's multilingual support, and what I
> want is the display and input of Chinese characters. Have any of you
> done this before? I tried according to GNU emacs' online manual, but
> still couldn't make it work. BTW, I am using Redhat Linux 7.2 and GNU
> emacs 20.7.

I don't know anything about Chinese support in general.  But with
Emacs, it was very easy.

I compiled and installed Emacs and I also installed some Chinese
fonts.  (The GNU intlfonts package, available from ftp.gnu.org, is a
good starting point.)

Then I typed M-x view-hello-file RET.  This showed me some Chinese
(and Japanese, and Korean) characters.  If you see empty boxes
instead of the Chinese characters, then some fonts are missing.

Then I typed C-\ chinese-py RET to select a Pinyin input method.

Then I typed nihao and saw two Chinese characters.
-- 
file-error; Data: (Opening input file no such file or directory ~/.signature)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
       [not found] ` <mailman.5739.1052375326.21513.help-gnu-emacs@gnu.org>
  2003-05-08  7:33   ` Robin Hu
@ 2003-05-10 14:28   ` Kai Großjohann
  1 sibling, 0 replies; 41+ messages in thread
From: Kai Großjohann @ 2003-05-10 14:28 UTC (permalink / raw)


Charles Muller <acmuller@gol.com> writes:

> When it comes to working with UTF-8, I have never heard of anyone
> succeeding in displaying East Asian scripts without installing the
> TEI-Emacs add-on.

The CVS version of Emacs has utf-translate-cjk-mode which allows me
to do this:

C-x C-x /some/nonexisting/file/name RET
C-u C-\ chinese-py RET
nihao                   (enter Chinese here)
C-x RET c utf-8 RET
C-x C-s

After this, I get a UTF-8 encoded file with Chinese characters in it.

utf-translate-cjk-mode used to be called utf-translate-cjk.  I don't
know when it appeared in Emacs.  Probably it isn't in 21.3.
-- 
file-error; Data: (Opening input file no such file or directory ~/.signature)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
  2003-05-10 14:26 ` Chinese characters support Kai Großjohann
@ 2003-05-10 16:17   ` Charles Muller
  2003-05-10 16:45     ` Kai Großjohann
                       ` (2 more replies)
  2003-05-12 23:05   ` Michael Na Li
       [not found]   ` <mailman.5922.1052583563.21513.help-gnu-emacs@gnu.org>
  2 siblings, 3 replies; 41+ messages in thread
From: Charles Muller @ 2003-05-10 16:17 UTC (permalink / raw)
  Cc: help-gnu-emacs

Kai wrote:

> Then I typed M-x view-hello-file RET.  This showed me some Chinese
> (and Japanese, and Korean) characters.  If you see empty boxes
> instead of the Chinese characters, then some fonts are missing.

I should be pointed out, nonetheless, that it is a bad idea to
cite the hello file as an example of international script functionality,
since it is set in an encoding that virtually no one ever uses (at least in
the CJK world), and it is quite often the case that that file will display
fine despite the fact that CJK won't work in utf-8 or native East Asian
encodings. Someone should either get rid of that file or save it in a
relevant encoding.

Chuck

---------------------------
Charles Muller  <acmuller@gol.com>
Faculty of Humanities,  Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary [http://www.acmuller.net]
H-Buddhism List Editor [http://www2.h-net.msu.edu/~buddhism/]
Mobile Phone: 090-9310-1787

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
  2003-05-10 16:17   ` Charles Muller
@ 2003-05-10 16:45     ` Kai Großjohann
  2003-05-10 17:31       ` Charles Muller
       [not found]       ` <mailman.5927.1052587973.21513.help-gnu-emacs@gnu.org>
  2003-05-10 17:58     ` Eli Zaretskii
       [not found]     ` <mailman.5936.1052589798.21513.help-gnu-emacs@gnu.org>
  2 siblings, 2 replies; 41+ messages in thread
From: Kai Großjohann @ 2003-05-10 16:45 UTC (permalink / raw)
  Cc: help-gnu-emacs

Charles Muller <acmuller@gol.com> writes:

> Kai wrote:
>
>> Then I typed M-x view-hello-file RET.  This showed me some Chinese
>> (and Japanese, and Korean) characters.  If you see empty boxes
>> instead of the Chinese characters, then some fonts are missing.
>
> I should be pointed out, nonetheless, that it is a bad idea to cite
> the hello file as an example of international script functionality,
> since it is set in an encoding that virtually no one ever uses (at
> least in the CJK world),

Really?  The HELLO file shows characters from a lot of different
encodings, and if used as such, then it is quite useful.

> and it is quite often the case that that file will display fine
> despite the fact that CJK won't work in utf-8

There are known problems with CJK support in UTF-8, but the situation
has improved greatly in the development version of Emacs.

> or native East Asian encodings. 

Can you cite examples?  I have had no problem with gb2312 and Chinese
characters, at least.  Others routinely use Shift-JIS and EUC-JP for
Japanese, I gather.

> Someone should either get rid of that file or save it in a relevant
> encoding.

The file is in a relevant encoding: it's the encoding used by Emacs
internally.  (Or rather, an encoding close to the internal encoding.)

This fact has its disadvantages, but it also has advantages.
-- 
file-error; Data: (Opening input file no such file or directory ~/.signature)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
  2003-05-10 16:45     ` Kai Großjohann
@ 2003-05-10 17:31       ` Charles Muller
  2003-05-10 18:43         ` Eli Zaretskii
  2003-05-10 19:24         ` Kai Großjohann
       [not found]       ` <mailman.5927.1052587973.21513.help-gnu-emacs@gnu.org>
  1 sibling, 2 replies; 41+ messages in thread
From: Charles Muller @ 2003-05-10 17:31 UTC (permalink / raw)
  Cc: help-gnu-emacs

Kai wrote:

> Really?  The HELLO file shows characters from a lot of different
> encodings, and if used as such, then it is quite useful.
> 
> > and it is quite often the case that that file will display fine
> > despite the fact that CJK won't work in utf-8
> 
> There are known problems with CJK support in UTF-8, but the situation
> has improved greatly in the development version of Emacs.

I know that, and I am not contesting that point. But again, the HELLO file
is not a utf-8 file. It is also not a form of JIS or other East Asian
encoding, so the fact that one can display multilingual scripts by opening
that file does not mean that they will be able to display them in Big5, JIS,
or whatever. If you check the archives for "utf-8+cjk", you will see that we have had a few
threads in the past year that dealt with problems trying to display CJK and
other international scripts, in which the advice was given to look at the
Hello file. As a person who has been working with international scripts and
utf-8 for a number years, I know firsthand the ability to be able to read
this file doesn't usually mean much. People who recommend checking this file
are usually people who don't use double-byte East Asian languages.

> The file is in a relevant encoding: it's the encoding used by Emacs
> internally.  (Or rather, an encoding close to the internal encoding.)

Relevant to whom? It's not in utf-8, right? Most of the problems people have
been having with CJK display in Emacs (at least until the appearance of
21.3.5) have to do with problems getting utf-8 to work, and the hello file
will still display even when these problems are not resolved.

No one that I know who works in XML or with East Asian international scripts
works in utf-7, so while that encoding format may be relevant for those who
are programming Emacs internally, it is not relevant for anyone using Emacs
to do multilingual XML or HTML publication, because no one uses it. That's
what I mean when I say "not relevant."

It is not my purpose to badmouth Emacs handling of Unicode. I know that
people have been working very hard to resolve these problems, and from what
I have been hearing, once everyone has copies of 21.3.5 installed with the
right Mule setup, this will be a past issue. Hopefully, somewhere along the
line, the Hello file will also graduate to utf-8.

Regards,

Chuck

---------------------------
Charles Muller  <acmuller@gol.com>
Faculty of Humanities,  Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary [http://www.acmuller.net]
H-Buddhism List Editor [http://www2.h-net.msu.edu/~buddhism/]
Mobile Phone: 090-9310-1787

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
  2003-05-10 16:17   ` Charles Muller
  2003-05-10 16:45     ` Kai Großjohann
@ 2003-05-10 17:58     ` Eli Zaretskii
       [not found]     ` <mailman.5936.1052589798.21513.help-gnu-emacs@gnu.org>
  2 siblings, 0 replies; 41+ messages in thread
From: Eli Zaretskii @ 2003-05-10 17:58 UTC (permalink / raw)


> Date: Sun, 11 May 2003 01:17:25 +0900 (JST)
> Newsgroups: gnu.emacs.help
> From: Charles Muller <acmuller@gol.com>
> 
> I should be pointed out, nonetheless, that it is a bad idea to
> cite the hello file as an example of international script functionality,
> since it is set in an encoding that virtually no one ever uses (at least in
> the CJK world)

It is certainly useful to see whether Emacs is set up correctly for
its non-ASCII support, including coding systems, fonts, and other
facilities.  Whether other software understands the way that file was
encoded is irrelevant for this.

> Someone should either get rid of that file or save it in a
> relevant encoding.

Until Emacs supports the full range of Unicode characters, the
encoding used now to save etc/HELLO is about _the_only_ one that can
do the job.  Let me remind you that in the released versions of Emacs,
only a subset of the BMP is supported.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
  2003-05-10 17:31       ` Charles Muller
@ 2003-05-10 18:43         ` Eli Zaretskii
  2003-05-11  2:11           ` Charles Muller
  2003-05-10 19:24         ` Kai Großjohann
  1 sibling, 1 reply; 41+ messages in thread
From: Eli Zaretskii @ 2003-05-10 18:43 UTC (permalink / raw)


> Date: Sun, 11 May 2003 02:31:49 +0900 (JST)
> From: Charles Muller <acmuller@gol.com>
> 
> the HELLO file
> is not a utf-8 file. It is also not a form of JIS or other East Asian
> encoding, so the fact that one can display multilingual scripts by opening
> that file does not mean that they will be able to display them in Big5, JIS,
> or whatever.

It does demonstrate that Emacs can display, read, and write Chinese
characters, Japanese characters, and other characters.  UTF-8 is not
the only way to dio that, and there's lots of other non-trivial
machinery, bot inside Emacs and outside it, that should be set up
correctly for it to be able to display etc/HELLO, even without UTF-8.

> If you check the archives for "utf-8+cjk", you will see that we have had a few
> threads in the past year that dealt with problems trying to display CJK and
> other international scripts, in which the advice was given to look at the
> Hello file.

If you read the archives of this forum (and of gnu.emacs.bug), you
will see that it's been recommended _a_lot_.

> It is not my purpose to badmouth Emacs handling of Unicode.

However, you've actually done precisely that.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
  2003-05-10 17:31       ` Charles Muller
  2003-05-10 18:43         ` Eli Zaretskii
@ 2003-05-10 19:24         ` Kai Großjohann
  2003-05-11  2:15           ` Charles Muller
       [not found]           ` <mailman.5956.1052619415.21513.help-gnu-emacs@gnu.org>
  1 sibling, 2 replies; 41+ messages in thread
From: Kai Großjohann @ 2003-05-10 19:24 UTC (permalink / raw)
  Cc: help-gnu-emacs

Charles Muller <acmuller@gol.com> writes:

> Kai wrote:
>
>> Really?  The HELLO file shows characters from a lot of different
>> encodings, and if used as such, then it is quite useful.
>> 
>> > and it is quite often the case that that file will display fine
>> > despite the fact that CJK won't work in utf-8
>> 
>> There are known problems with CJK support in UTF-8, but the situation
>> has improved greatly in the development version of Emacs.
>
> I know that, and I am not contesting that point. But again, the
> HELLO file is not a utf-8 file. It is also not a form of JIS or
> other East Asian encoding, so the fact that one can display
> multilingual scripts by opening that file does not mean that they
> will be able to display them in Big5, JIS, or whatever. If you check
> the archives for "utf-8+cjk", you will see that we have had a few
> threads in the past year that dealt with problems trying to display
> CJK and other international scripts, in which the advice was given
> to look at the Hello file. As a person who has been working with
> international scripts and utf-8 for a number years, I know firsthand
> the ability to be able to read this file doesn't usually mean
> much. People who recommend checking this file are usually people who
> don't use double-byte East Asian languages.

I know that I often suggest people to have a look at the HELLO file.
I do that to answer one question: does Emacs find the right fonts?

While a correctly-looking HELLO file is not sufficient for correct
functioning of CJK, it is a requirement.  That is, if HELLO looks
bad, that needs to be fixed first.

However, I don't use CJK myself (much), and therefore I am quite
ready to be proven wrong.  Is it common to have a garbled HELLO file
but still CJK is working right?
-- 
This line is not blank.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
  2003-05-10 18:43         ` Eli Zaretskii
@ 2003-05-11  2:11           ` Charles Muller
  2003-05-11  3:32             ` Eli Zaretskii
  0 siblings, 1 reply; 41+ messages in thread
From: Charles Muller @ 2003-05-11  2:11 UTC (permalink / raw)


Eli wrote:

> If you read the archives of this forum (and of gnu.emacs.bug), you
> will see that it's been recommended _a_lot_.
> 
> > It is not my purpose to badmouth Emacs handling of Unicode.
> 
> However, you've actually done precisely that.

All I'm trying to do, as a person who does a lot of work with CJK and utf-8
is point out something that can be done a little better. I can't understand
why you guys can't simply say "hmm.. perhaps this is something we should
look into." What's the big deal?

Chuck

---------------------------
Charles Muller  <acmuller@gol.com>
Faculty of Humanities,  Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary [http://www.acmuller.net]
H-Buddhism List Editor [http://www2.h-net.msu.edu/~buddhism/]
Mobile Phone: 090-9310-1787

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
  2003-05-10 19:24         ` Kai Großjohann
@ 2003-05-11  2:15           ` Charles Muller
  2003-05-11  3:34             ` Eli Zaretskii
       [not found]           ` <mailman.5956.1052619415.21513.help-gnu-emacs@gnu.org>
  1 sibling, 1 reply; 41+ messages in thread
From: Charles Muller @ 2003-05-11  2:15 UTC (permalink / raw)


Kai wrote:

> However, I don't use CJK myself (much), and therefore I am quite
> ready to be proven wrong.  Is it common to have a garbled HELLO file
> but still CJK is working right?

No, it is exactly the opposite. It is common for one to be able to display
the HELLO file without problems, but to still have difficulty displaying CJK
in other encodings, especially UTF-8. 

Chuck

---------------------------
Charles Muller  <acmuller@gol.com>
Faculty of Humanities,  Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary [http://www.acmuller.net]
H-Buddhism List Editor [http://www2.h-net.msu.edu/~buddhism/]
Mobile Phone: 090-9310-1787

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
  2003-05-11  2:11           ` Charles Muller
@ 2003-05-11  3:32             ` Eli Zaretskii
  2003-05-11 13:59               ` Charles Muller
       [not found]               ` <mailman.5976.1052661651.21513.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 41+ messages in thread
From: Eli Zaretskii @ 2003-05-11  3:32 UTC (permalink / raw)
  Cc: help-gnu-emacs

> Date: Sun, 11 May 2003 11:11:48 +0900 (JST)
> From: Charles Muller <acmuller@gol.com>
> 
> Eli wrote:
> 
> > If you read the archives of this forum (and of gnu.emacs.bug), you
> > will see that it's been recommended _a_lot_.
> > 
> > > It is not my purpose to badmouth Emacs handling of Unicode.
> > 
> > However, you've actually done precisely that.
> 
> All I'm trying to do, as a person who does a lot of work with CJK and utf-8
> is point out something that can be done a little better. I can't understand
> why you guys can't simply say "hmm.. perhaps this is something we should
> look into." What's the big deal?

Is anything but complete acceptance of your opinions going to convince
you that we know what we are talking about?

Look, all I was trying to do, as a person who did some work in this
area for Emacs, and as someone who does get to answer lots of
questions about this, is to tell you that etc/HELLO _is_ useful, and
that the last thing I'd expect Emacs maintainers to do is to remove
it.  Why cannot you accept that?

It goes without saying that in a Unicode Emacs, the one that will have
its internal representation of characters based on Unicode codepoints,
HELLO will be recoded, either in that internal representation or in
UTF-8.  But until that happens, I don't see any reason to recode it.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
  2003-05-11  2:15           ` Charles Muller
@ 2003-05-11  3:34             ` Eli Zaretskii
  0 siblings, 0 replies; 41+ messages in thread
From: Eli Zaretskii @ 2003-05-11  3:34 UTC (permalink / raw)


> Date: Sun, 11 May 2003 11:15:32 +0900 (JST)
> From: Charles Muller <acmuller@gol.com>
> 
> No, it is exactly the opposite. It is common for one to be able to display
> the HELLO file without problems, but to still have difficulty displaying CJK
> in other encodings, especially UTF-8. 

So etc/HELLO is not the ultimate solution for testing how well CJK is
handled.  So what?  It certainly helps to test certain aspects of
that.  And for a file that requires near-zero maintenance, that's a
lot.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
  2003-05-11  3:32             ` Eli Zaretskii
@ 2003-05-11 13:59               ` Charles Muller
       [not found]               ` <mailman.5976.1052661651.21513.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 41+ messages in thread
From: Charles Muller @ 2003-05-11 13:59 UTC (permalink / raw)
  Cc: help-gnu-emacs

Eli wrote:
 
> Is anything but complete acceptance of your opinions going to convince
> you that we know what we are talking about?

It seems like this discussion has expanded out of proportion.

> Look, all I was trying to do, as a person who did some work in this
> area for Emacs, and as someone who does get to answer lots of
> questions about this, is to tell you that etc/HELLO _is_ useful, and
> that the last thing I'd expect Emacs maintainers to do is to remove
> it.  Why cannot you accept that?

I never requested the removal of the HELLO file. All I said was that as long
as it was maintained in utf-7, it was not especially useful as test file for
people who are trying to get their CJK working right.

> It goes without saying that in a Unicode Emacs, the one that will have
> its internal representation of characters based on Unicode codepoints,
> HELLO will be recoded, either in that internal representation or in
> UTF-8.  

This seems like a good approach. I appreciate your efforts to
understand the issue.

>But until that happens, I don't see any reason to recode it.

Whatever. Hopefully, as Emacs 21.3-5 with the appropriate Mule settings is
widely proliferated, this will be a moot issue. Emacs is a superb piece of
work, and I appreciate the efforts of all its developers to continually
expand its versatility.

Chuck

---------------------------
Charles Muller  <acmuller@gol.com>
Faculty of Humanities,  Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary [http://www.acmuller.net]
H-Buddhism List Editor [http://www2.h-net.msu.edu/~buddhism/]
Mobile Phone: 090-9310-1787

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
       [not found]       ` <mailman.5927.1052587973.21513.help-gnu-emacs@gnu.org>
@ 2003-05-12 19:27         ` Jason Rumney
  2003-05-13  7:40         ` Lee Sau Dan
  1 sibling, 0 replies; 41+ messages in thread
From: Jason Rumney @ 2003-05-12 19:27 UTC (permalink / raw)


Charles Muller <acmuller@gol.com> writes:

> If you check the archives for "utf-8+cjk", you will see that we have
> had a few threads in the past year that dealt with problems trying
> to display CJK and other international scripts, in which the advice
> was given to look at the Hello file. As a person who has been
> working with international scripts and utf-8 for a number years, I
> know firsthand the ability to be able to read this file doesn't
> usually mean much.

If HELLO displays correctly, it means that Emacs has all it need in
order to display characters, and the problem lies in the
encoding/decoding process. If we recoded HELLO into UTF-8, then if
people were having problems displaying utf-8 encoded text, looking at
HELLO would just not work. At least now, it is a way to quickly
narrow down the problem.

> People who recommend checking this file are usually people who don't
> use double-byte East Asian languages.

No, they are people that know how Emacs works, and are trying to help
by narrowing down the problem.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
       [not found]               ` <mailman.5976.1052661651.21513.help-gnu-emacs@gnu.org>
@ 2003-05-12 19:29                 ` Jason Rumney
  2003-05-12 19:58                 ` Kai Großjohann
  2003-05-13  7:40                 ` Lee Sau Dan
  2 siblings, 0 replies; 41+ messages in thread
From: Jason Rumney @ 2003-05-12 19:29 UTC (permalink / raw)


Charles Muller <acmuller@gol.com> writes:

> I never requested the removal of the HELLO file. All I said was that as long
> as it was maintained in utf-7,

It has never been maintained in utf-7, and no suggestion has ever
been made to encode it in utf-7. The current encoding is based on
iso-2022 AFAIK.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
       [not found]           ` <mailman.5956.1052619415.21513.help-gnu-emacs@gnu.org>
@ 2003-05-12 19:56             ` Kai Großjohann
  2003-05-13  3:36               ` Charles Muller
       [not found]               ` <mailman.6084.1052797097.21513.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 41+ messages in thread
From: Kai Großjohann @ 2003-05-12 19:56 UTC (permalink / raw)


Charles Muller <acmuller@gol.com> writes:

> Kai wrote:
>
>> However, I don't use CJK myself (much), and therefore I am quite
>> ready to be proven wrong.  Is it common to have a garbled HELLO file
>> but still CJK is working right?
>
> No, it is exactly the opposite. It is common for one to be able to
> display the HELLO file without problems, but to still have
> difficulty displaying CJK in other encodings, especially UTF-8.

Well, that's good, then.  The HELLO file does exactly what I need: it
helps me to narrow down the area where people are having problems.
If HELLO displays fine, then the problem is not the fonts.

Maybe HELLO does not do what you need, but that doesn't make it
useless in general, IMVHO.
-- 
This line is not blank.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
       [not found]               ` <mailman.5976.1052661651.21513.help-gnu-emacs@gnu.org>
  2003-05-12 19:29                 ` Jason Rumney
@ 2003-05-12 19:58                 ` Kai Großjohann
  2003-05-13  7:40                 ` Lee Sau Dan
  2 siblings, 0 replies; 41+ messages in thread
From: Kai Großjohann @ 2003-05-12 19:58 UTC (permalink / raw)


Charles Muller <acmuller@gol.com> writes:

> I never requested the removal of the HELLO file. All I said was that as long
> as it was maintained in utf-7,

(It's not in utf-7.  Though it's not relevant here.)

> it was not especially useful as test file for people who are trying
> to get their CJK working right.

It's *very* useful.  It halves the problem space.
-- 
This line is not blank.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
  2003-05-10 14:26 ` Chinese characters support Kai Großjohann
  2003-05-10 16:17   ` Charles Muller
@ 2003-05-12 23:05   ` Michael Na Li
  2003-05-13  7:02     ` Kai Großjohann
       [not found]   ` <mailman.5922.1052583563.21513.help-gnu-emacs@gnu.org>
  2 siblings, 1 reply; 41+ messages in thread
From: Michael Na Li @ 2003-05-12 23:05 UTC (permalink / raw)


On 10 May 2003, Kai Großjohann spake thusly:

>  Gaoyan Xie <gxie@eecs.wsu.edu> writes:
>  
> >  I am trying to explore GNU emacs's multilingual support, and what I
> >  want is the display and input of Chinese characters. Have any of you
> >  done this before? I tried according to GNU emacs' online manual, but
> >  still couldn't make it work. BTW, I am using Redhat Linux 7.2 and GNU
> >  emacs 20.7.
>  
>  I don't know anything about Chinese support in general.  But with
>  Emacs, it was very easy.
>  
>  I compiled and installed Emacs and I also installed some Chinese
>  fonts.  (The GNU intlfonts package, available from ftp.gnu.org, is a
>  good starting point.)
>  
>  Then I typed M-x view-hello-file RET.  This showed me some Chinese
>  (and Japanese, and Korean) characters.  If you see empty boxes
>  instead of the Chinese characters, then some fonts are missing.
>  
>  Then I typed C-\ chinese-py RET to select a Pinyin input method.

Don't you need M-x set-language-environment RET Chinese-GB RET such that the
file is saved in gb2312 coding?

The chinese-py-punct method also provides ways to input Chinese style
punctuations. 

Michael

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
  2003-05-12 19:56             ` Kai Großjohann
@ 2003-05-13  3:36               ` Charles Muller
  2003-05-14  3:14                 ` Eli Zaretskii
       [not found]               ` <mailman.6084.1052797097.21513.help-gnu-emacs@gnu.org>
  1 sibling, 1 reply; 41+ messages in thread
From: Charles Muller @ 2003-05-13  3:36 UTC (permalink / raw)
  Cc: help-gnu-emacs

Kai wrote:

> Maybe HELLO does not do what you need, but that doesn't make it
> useless in general, IMVHO.

I never said it was useless in general, and have never
suggested that the HELLO file should be relegated to oblivion. 

One more time:

Since the HELLO file is used for internal testing by Emacs coders it almost
always works correctly in any recent Emacs "out of the box." 

The common misunderstanding occurs when people who are trying to get
CJK working in utf-8 write to this, or another list for help, and list
members, in the spirit of trying to be helpful, suggest that all is fine if
the HELLO file displays right. 

But since the HELLO file is encoded in iso-2022 (not utf-7, as I originally
stated) it is the case, in my fairly extensive experience with the matter,
that the HELLO file will almost invariably display fine, while the original
problem (usually Mule-related) remains untouched upon.

Since the people who usually make the suggestion to test via the HELLO are
those who do not regularly use CJK, it seems that they are not aware of this
discrepancy, and I wanted to point this out. 

It seems strange to see people react so emotionally to the exposure
of this simple point. No one is asking that the hallowed HELLO file be sent
to oblivion--although a reincarnation as utf-8 would certainly not hurt! :-)

Chuck

---------------------------
Charles Muller  <acmuller@gol.com>
Faculty of Humanities,  Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary [http://www.acmuller.net]
H-Buddhism List Editor [http://www2.h-net.msu.edu/~buddhism/]
Mobile Phone: 090-9310-1787

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
  2003-05-12 23:05   ` Michael Na Li
@ 2003-05-13  7:02     ` Kai Großjohann
  0 siblings, 0 replies; 41+ messages in thread
From: Kai Großjohann @ 2003-05-13  7:02 UTC (permalink / raw)


Michael Na Li <lina@u.washington.edu> writes:

> Don't you need M-x set-language-environment RET Chinese-GB RET such that the
> file is saved in gb2312 coding?

It seems to offer gb2312 by default.  That's because the characters
in the buffer are gb2312.

But if I was using Chinese all the time, I'd surely set my language
environment to Chinese-GB.  However, right now LC_CTYPE is set to
de_DE@euro...
-- 
This line is not blank.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
       [not found]               ` <mailman.6084.1052797097.21513.help-gnu-emacs@gnu.org>
@ 2003-05-13  7:05                 ` Kai Großjohann
  2003-05-14  6:14                 ` Lee Sau Dan
  1 sibling, 0 replies; 41+ messages in thread
From: Kai Großjohann @ 2003-05-13  7:05 UTC (permalink / raw)


Charles Muller <acmuller@gol.com> writes:

> Since the HELLO file is used for internal testing by Emacs coders it almost
> always works correctly in any recent Emacs "out of the box." 

Ah, I see.  Actually, some people see empty boxes when the display
HELLO.  But you're right, usually it Just Works.

And you're also right in that something more is needed to test
Unicode support in Emacs.  It seems that installing Mule-UCS on Emacs
20 also makes it Just Work, more or less.  I've had less of a success
with installing Mule-UCS on Emacs 21 -- there was some double-UTF-8
encoding in messages written with Gnus.
-- 
This line is not blank.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
       [not found]       ` <mailman.5927.1052587973.21513.help-gnu-emacs@gnu.org>
  2003-05-12 19:27         ` Jason Rumney
@ 2003-05-13  7:40         ` Lee Sau Dan
  2003-05-13 10:11           ` acmuller
                             ` (2 more replies)
  1 sibling, 3 replies; 41+ messages in thread
From: Lee Sau Dan @ 2003-05-13  7:40 UTC (permalink / raw)


>>>>> "Charles" == Charles Muller <acmuller@gol.com> writes:

    Charles> I know that, and I am not contesting that point. But
    Charles> again, the HELLO file is not a utf-8 file. 

I think you're being religious.  Why must it be utf-8?


    Charles> It is also not a form of JIS or other East Asian
    Charles> encoding,

It's  emacs-mule  encoding  ---   Emac's  own  representation  of  the
information about characters/encodings that it keeps.


    Charles>so the fact that one can display multilingual
    Charles> scripts by opening that file does not mean that they will
    Charles> be able to display them in Big5, JIS, or whatever.

If one can see  the Big5 text in that file, he  can see all other Big5
files.  If one  can see the Thai characters in that  file, he can also
see  the Thai  characters when  he  opens a  Thai text  file with  the
suitable     encoding    (the     default    if     he     has    done
set-language-environement correctly).  And so on.


    Charles> People who recommend checking this file are usually
    Charles> people who don't use double-byte East Asian languages.

Sorry, I  use Big5 very often.   And I do  recommend C-h h as  a quick
test to see if he has installed the big5 fonts correctly.  (Big5 fonts
do not come with XFree86, and many Linux distros has been ignoring the
"leim" and "intlfont" packages for years.)


    >> The file is in a relevant encoding: it's the encoding used by
    >> Emacs internally.  (Or rather, an encoding close to the
    >> internal encoding.)

    Charles> Relevant to whom?

To Emacs.


    Charles> It's not in utf-8, right?

So what?   My .signature is  in Big5 and  it is not in  utf-8, either.
And  my .emacs file  is in  emacs-mule encoding,  which is  not utf-8,
either.  Neither are utf-16 files utf-8.

I think  you're being religious  when you worship utf-8.   For Chinese
text, utf-8 wastes 50% of  storage space.  I'd rather use utf-16.  But
big5 has the  same storage efficiency (and more  when you include some
English text) and it is more common.



    Charles> No one that I know who works in XML or with East Asian
    Charles> international scripts works in utf-7,

And for XML in Chinese, utf-8  wastes lots of space.  To be practical,
we often use big5 for XML files with Chinese.


    Charles> so while that encoding format may be relevant for those
    Charles> who are programming Emacs internally, it is not relevant
    Charles> for anyone using Emacs to do multilingual XML or HTML
    Charles> publication, because no one uses it. That's what I mean
    Charles> when I say "not relevant."

My  experience with  Emac's utf-8  <--> internal  conversion  has been
good.



-- 
Lee Sau Dan                     李守敦(Big5)                    ~{@nJX6X~}(HZ) 

E-mail: danlee@informatik.uni-freiburg.de
Home page: http://www.informatik.uni-freiburg.de/~danlee

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
       [not found]   ` <mailman.5922.1052583563.21513.help-gnu-emacs@gnu.org>
@ 2003-05-13  7:40     ` Lee Sau Dan
  0 siblings, 0 replies; 41+ messages in thread
From: Lee Sau Dan @ 2003-05-13  7:40 UTC (permalink / raw)


>>>>> "Charles" == Charles Muller <acmuller@gol.com> writes:

    Charles> Kai wrote:
    >> Then I typed M-x view-hello-file RET.  This showed me some
    >> Chinese (and Japanese, and Korean) characters.  If you see
    >> empty boxes instead of the Chinese characters, then some fonts
    >> are missing.

    Charles> I should be pointed out, nonetheless, that it is a bad
    Charles> idea to cite the hello file as an example of
    Charles> international script functionality, 

Why  not?   That  file  really illustrates  the  international  script
functionality.


    Charles> since it is set in an encoding that virtually no one ever
    Charles> uses (at least in the CJK world), 

That's  a  problem with  encoding,  not  Emacs's international  script
functionality.  Maybe,  you have "conformance to  Unicode and national
encodings" in mind when you said "international script functionality".
They're different issues.



    Charles> and it is quite often the case that that file will
    Charles> display fine despite the fact that CJK won't work in
    Charles> utf-8 or native East Asian encodings. 

C-x RET c utf-8 C-x s ... does save my Chinese text files in UTF-8.
C-x RET c big5 C-x s ... does save my Chinese text files in BIG5 --
the "native" encoding for traditional Chinese.

And needless to say, I can read  files in UTF-8 and big5 using C-x RET
c ... C-x  C-f.  (For Emacs 20, I need to  install an external package
for Unicode encodings: MuleUCS or something like that.)


    Charles> Someone should either get rid of that file or save it in
    Charles> a relevant encoding.

Since no  "native" encoding preserves the details  that the emacs-mule
encoding saves, that "showoff" file  must be kept in emacs-mule.  e.g.
the  section "Difference  among chinese  characters in  GB,  JIS, KSC,
BIG5" would  be impossible with Unicode,  GB, JIS, KSC  or BIG5.  Only
emacs-mule have enough coding  space to accomodate all characters from
these  encodings   and  yet   not  unify  them   to  make   them  look
non-identical.




-- 
Lee Sau Dan                     李守敦(Big5)                    ~{@nJX6X~}(HZ) 

E-mail: danlee@informatik.uni-freiburg.de
Home page: http://www.informatik.uni-freiburg.de/~danlee

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
       [not found]               ` <mailman.5976.1052661651.21513.help-gnu-emacs@gnu.org>
  2003-05-12 19:29                 ` Jason Rumney
  2003-05-12 19:58                 ` Kai Großjohann
@ 2003-05-13  7:40                 ` Lee Sau Dan
  2003-05-13  9:57                   ` acmuller
  2003-05-13 10:02                   ` Robin Hu
  2 siblings, 2 replies; 41+ messages in thread
From: Lee Sau Dan @ 2003-05-13  7:40 UTC (permalink / raw)


>>>>> "Charles" == Charles Muller <acmuller@gol.com> writes:

    Charles> I never requested the removal of the HELLO file. All I
    Charles> said was that as long as it was maintained in utf-7, 

Are you sure it's utf-7?   Then how come it can distinguish characters
in BIG5 and equivalent characters in JIS?


    Charles> it was not especially useful as test file for people who
    Charles> are trying to get their CJK working right.

It  seems  to me  that  when you  talk  about  "CJK", you're  actually
refering to "utf-8".

Many people using  the CJK parts of Emacs only  work with the national
encodings (Big5, GB,  JIS, KSC, etc.)  and in  those cases, they Emacs
works excellently.




-- 
Lee Sau Dan                     李守敦(Big5)                    ~{@nJX6X~}(HZ) 

E-mail: danlee@informatik.uni-freiburg.de
Home page: http://www.informatik.uni-freiburg.de/~danlee

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
       [not found]     ` <mailman.5936.1052589798.21513.help-gnu-emacs@gnu.org>
@ 2003-05-13  7:40       ` Lee Sau Dan
  2003-05-14  3:15         ` Eli Zaretskii
       [not found]         ` <mailman.6156.1052882447.21513.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 41+ messages in thread
From: Lee Sau Dan @ 2003-05-13  7:40 UTC (permalink / raw)


>>>>> "Eli" == Eli Zaretskii <eliz@elta.co.il> writes:

    Eli> Until Emacs supports the full range of Unicode characters,
    Eli> the encoding used now to save etc/HELLO is about _the_only_
    Eli> one that can do the job.  Let me remind you that in the
    Eli> released versions of Emacs, only a subset of the BMP is
    Eli> supported.

I  don't  think  so.   Unicode  will  never  be  able  to  handle  the
"Difference among  chinese characters in GB, JIS,  KSC, BIG5:" section
in  the  etc/HELLO  file.   Unicode simply  unifies  the  "equivalent"
characters that show up differently  in that section of the HELLO file
into single code points.



-- 
Lee Sau Dan                     李守敦(Big5)                    ~{@nJX6X~}(HZ) 

E-mail: danlee@informatik.uni-freiburg.de
Home page: http://www.informatik.uni-freiburg.de/~danlee

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
  2003-05-13  7:40                 ` Lee Sau Dan
@ 2003-05-13  9:57                   ` acmuller
  2003-05-13 10:02                   ` Robin Hu
  1 sibling, 0 replies; 41+ messages in thread
From: acmuller @ 2003-05-13  9:57 UTC (permalink / raw)




On 5/13/2003, Lee Sau Dan wrote:

>Are you sure it's utf-7?   Then how come it can distinguish characters
>in BIG5 and equivalent characters in JIS?

As was corrected in an earlier message, it is iso-2022, not utf-7.

>    Charles> it was not especially useful as test file for people who
>    Charles> are trying to get their CJK working right.
>
>It  seems  to me  that  when you  talk  about  "CJK", you're  actually
>refering to "utf-8".

No, I am not. But my discussion from the outset has been centered on utf-8
related problems. As you point out (and as I noted earlier in this thread)
most people are able to get CJK working with localized DCBS encodings
without too much trouble. 

I, and the people that I am collaborating with in XML-based data projects,
are all at the bleeding edge using utf-8, and thus I have had to deal with
this problem extensively in trying to get everyone's systems set up
right.

Chuck


Charles Muller <acmuller@gol.com>
Toyo Gakuen University

Digital Dictionary of Buddhism: www.acmuller.net

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
  2003-05-13  7:40                 ` Lee Sau Dan
  2003-05-13  9:57                   ` acmuller
@ 2003-05-13 10:02                   ` Robin Hu
  2003-05-15  8:07                     ` Lee Sau Dan
  1 sibling, 1 reply; 41+ messages in thread
From: Robin Hu @ 2003-05-13 10:02 UTC (permalink / raw)


>>>>> "Lee" == Lee Sau Dan <danlee@informatik.uni-freiburg.de> writes:

>>>>> "Charles" == Charles Muller <acmuller@gol.com> writes:

    Lee> Many people using the CJK parts of Emacs only work with the
    Lee> national encodings (Big5, GB, JIS, KSC, etc.)  and in those
    Lee> cases, they Emacs works excellently.

    I think you are over-simpilify this problem. ;-( Most CJK characters
    are not encoded in either Big5 or GB or JIS or KSC, that's why the
    GB coding standard change from gb2312 to gbk then to gb18030. AFAIK,
    most chinese characters also cannot be coded within mule, and exists
    unicode support does not solve this problem.

    Of course, emacs is enough for most people in most time, but I am
    really hesitated to tell my friend once and once again "Sorry, but
    your name (羽中) is not supported by my emacs." 

-- 
The goal of science is to build better mousetraps.  The goal of nature
is to build better mice.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
  2003-05-13  7:40         ` Lee Sau Dan
@ 2003-05-13 10:11           ` acmuller
  2003-05-13 10:54           ` Charles Muller
       [not found]           ` <mailman.6097.1052826249.21513.help-gnu-emacs@gnu.org>
  2 siblings, 0 replies; 41+ messages in thread
From: acmuller @ 2003-05-13 10:11 UTC (permalink / raw)




On 5/13/2003, "Lee Sau Dan" <danlee@informatik.uni-freiburg.de> wrote:


>I think  you're being religious  when you worship utf-8.  

Who said anything about worship? Why the sarcasm? I am an XML developer.
UTF-8 is the standard encoding for XML documents.

See http://www.w3.org/TR/REC-xml


Chuck



Charles Muller <acmuller@gol.com>
Toyo Gakuen University

Digital Dictionary of Buddhism: www.acmuller.net

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
  2003-05-13  7:40         ` Lee Sau Dan
  2003-05-13 10:11           ` acmuller
@ 2003-05-13 10:54           ` Charles Muller
       [not found]           ` <mailman.6097.1052826249.21513.help-gnu-emacs@gnu.org>
  2 siblings, 0 replies; 41+ messages in thread
From: Charles Muller @ 2003-05-13 10:54 UTC (permalink / raw)


Lee Sau Dan wrote:

> And for XML in Chinese, utf-8  wastes lots of space.  To be practical,
> we often use big5 for XML files with Chinese.

That's fine, if all you are doing is Chinese. The documents in my project 
include terms from over 15 languages, including Tibetan, Nepalese, Sanskrit,
Pali, and several European languages. Unicode has codepoints for these
characters, while Big5 (and other Chinese codesets) do not.

Chuck

---------------------------
Charles Muller  <acmuller@gol.com>
Faculty of Humanities,  Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary [http://www.acmuller.net]
H-Buddhism List Editor [http://www2.h-net.msu.edu/~buddhism/]
Mobile Phone: 090-9310-1787

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
  2003-05-13  3:36               ` Charles Muller
@ 2003-05-14  3:14                 ` Eli Zaretskii
  0 siblings, 0 replies; 41+ messages in thread
From: Eli Zaretskii @ 2003-05-14  3:14 UTC (permalink / raw)


> Date: Tue, 13 May 2003 12:36:28 +0900 (JST)
> Newsgroups: gnu.emacs.help
> From: Charles Muller <acmuller@gol.com>
> 
> I never said it was useless in general, and have never
> suggested that the HELLO file should be relegated to oblivion. 

Perhaps someone else in this thread did, then.

> Since the HELLO file is used for internal testing by Emacs coders it almost
> always works correctly in any recent Emacs "out of the box." 

The ability to display HELLO depends on the local configuration
(fonts), so it is not guaranteed to work on every platform.

> Since the people who usually make the suggestion to test via the HELLO are
> those who do not regularly use CJK, it seems that they are not aware of this
> discrepancy, and I wanted to point this out. 

Actually, HELLO was created by people who use CJK every day.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
  2003-05-13  7:40       ` Lee Sau Dan
@ 2003-05-14  3:15         ` Eli Zaretskii
       [not found]         ` <mailman.6156.1052882447.21513.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 41+ messages in thread
From: Eli Zaretskii @ 2003-05-14  3:15 UTC (permalink / raw)


> From: Lee Sau Dan <danlee@informatik.uni-freiburg.de>
> Newsgroups: gnu.emacs.help
> Date: 13 May 2003 09:40:16 +0200
> 
> >>>>> "Eli" == Eli Zaretskii <eliz@elta.co.il> writes:
> 
>     Eli> Until Emacs supports the full range of Unicode characters,
>     Eli> the encoding used now to save etc/HELLO is about _the_only_
>     Eli> one that can do the job.  Let me remind you that in the
>     Eli> released versions of Emacs, only a subset of the BMP is
>     Eli> supported.
> 
> I  don't  think  so.   Unicode  will  never  be  able  to  handle  the
> "Difference among  chinese characters in GB, JIS,  KSC, BIG5:" section
> in  the  etc/HELLO  file.

Unicode doesn't, but the Unicode Emacs will.  Trust me ;-)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
       [not found]               ` <mailman.6084.1052797097.21513.help-gnu-emacs@gnu.org>
  2003-05-13  7:05                 ` Kai Großjohann
@ 2003-05-14  6:14                 ` Lee Sau Dan
  2003-05-14 16:27                   ` Kai Großjohann
  1 sibling, 1 reply; 41+ messages in thread
From: Lee Sau Dan @ 2003-05-14  6:14 UTC (permalink / raw)


>>>>> "Charles" == Charles Muller <acmuller@gol.com> writes:

    Charles> One more time:

    Charles> Since the HELLO file is used for internal testing by
    Charles> Emacs coders it almost always works correctly in any
    Charles> recent Emacs "out of the box."

No.  If you  have problems with the font  installation (esp. when none
of  your font servers  offer the  relevant fonts  or your  sys. admin.
simply don't  care about your non-English needs),  HELLO won't display
the glyphs.  It only display boxes there.


    Charles> The common misunderstanding occurs when people who are
    Charles> trying to get CJK working in utf-8 write to this, or
    Charles> another list for help, and list members, in the spirit of
    Charles> trying to be helpful, suggest that all is fine if the
    Charles> HELLO file displays right.

For utf-8 testing, I'd refer someone  to the test files in the MuleUCS
package.



    Charles> Since the people who usually make the suggestion to test
    Charles> via the HELLO are those who do not regularly use CJK, it
    Charles> seems that they are not aware of this discrepancy, and I
    Charles> wanted to point this out.

No.  Those people often use CJK regularly.  They just don't use utf-8.
Like me (using Big5), they use a national encoding (e.g. GB2312, JIS,
KSC).


    Charles> It seems strange to see people react so emotionally to
    Charles> the exposure of this simple point. No one is asking that
    Charles> the hallowed HELLO file be sent to oblivion--although a
    Charles> reincarnation as utf-8 would certainly not hurt! :-)

That WILL  certainly HURT.  Look carefully at  the section "Difference
among chinese characters  in GB, JIS, KSC, BIG5:"  in HELLO.  The same
thing cannot  be reproduced in vanilla utf-8,  because Unicode unifies
the various characters  in these encoding into one  single code point.
(Most  efforts in  the earlier  versions  of Unicode  were devoted  to
_unifying_  characters from  different languages,  employing different
national encodings.  The result is that you can no longer tell where a
unified character is from Korean, Japanese and Chinese, who write them
in slightly different ways.)


If you  want to  test UTF-8  (Why not UTF-16?   People who  really use
computers for  Far East languages (CJK)  would have to  waste 50% disk
space if  they use UTF-8  to store their  text files.  UTF-16  is more
space efficient.),  do suggest  including a UTF-8  test file.   (Add a
line in  HELLO to  instruct anyone  how to open  the UTF-8  test file,
favourably  with hot-key bindings.)   And why  stop there?   Also have
UTF-16 and UTF-7  test files.  UTF-8 is simply  NOT the magic panacea.
It  sucks  when  you have  a  file  full  of Chinese  characters,  for
instance.  The 3-byte per Chinese character "feature" of UTF-8 sucks.

HELLO should remain a test file for the internal encoding "emacs-mule"
and for  displaying the true  multilingual capabilities of  Emacs.  It
has also been serving well to test font installation.  It should never
be  recoded in  utf-8, IMO.   If  all you  care about  is UTF-8,  have
another test  file.  Assuming that all  CJK users should  use UTF-8 is
like assuming that everyone should fall faith to Vatican.


-- 
Lee Sau Dan                     李守敦(Big5)                    ~{@nJX6X~}(HZ) 

E-mail: danlee@informatik.uni-freiburg.de
Home page: http://www.informatik.uni-freiburg.de/~danlee

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
  2003-05-14  6:14                 ` Lee Sau Dan
@ 2003-05-14 16:27                   ` Kai Großjohann
  2003-05-14 21:07                     ` Jason Rumney
  0 siblings, 1 reply; 41+ messages in thread
From: Kai Großjohann @ 2003-05-14 16:27 UTC (permalink / raw)


Lee Sau Dan <danlee@informatik.uni-freiburg.de> writes:

> If you  want to  test UTF-8  (Why not UTF-16?   People who  really use
> computers for  Far East languages (CJK)  would have to  waste 50% disk
> space if  they use UTF-8  to store their  text files.  UTF-16  is more
> space efficient.),  do suggest  including a UTF-8  test file.   (Add a
> line in  HELLO to  instruct anyone  how to open  the UTF-8  test file,
> favourably  with hot-key bindings.)   And why  stop there?   Also have
> UTF-16 and UTF-7  test files.  UTF-8 is simply  NOT the magic panacea.
> It  sucks  when  you have  a  file  full  of Chinese  characters,  for
> instance.  The 3-byte per Chinese character "feature" of UTF-8 sucks.

Why not include UTF-8 characters in the HELLO file?  I gather that
iso-2022 is general enough to also allow using UTF-8 as one of the
encodings it supports.

But I'm not an expert.
-- 
This line is not blank.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
  2003-05-14 16:27                   ` Kai Großjohann
@ 2003-05-14 21:07                     ` Jason Rumney
  0 siblings, 0 replies; 41+ messages in thread
From: Jason Rumney @ 2003-05-14 21:07 UTC (permalink / raw)


kai.grossjohann@gmx.net (Kai Großjohann) writes:

> Why not include UTF-8 characters in the HELLO file?  I gather that
> iso-2022 is general enough to also allow using UTF-8 as one of the
> encodings it supports.

compound-text-with-extensions does, but not pure iso-2022 AFAIK.
Anyway, HELLO is in emacs-mule encoding, which is a 16-bit encoding
based on iso-2022, so only supports a certain fixed number of
character sets (most iso-2022 based encodings only support 2 or 3
character sets).  CVS Emacs does contain some unicode text in HELLO,
but only a subset of Unicode is supported without conversion (hence
the existence of `utf-translate-cjk-mode' which causes the large
translation tables to be loaded).

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
  2003-05-13 10:02                   ` Robin Hu
@ 2003-05-15  8:07                     ` Lee Sau Dan
  0 siblings, 0 replies; 41+ messages in thread
From: Lee Sau Dan @ 2003-05-15  8:07 UTC (permalink / raw)


>>>>> "Robin" == Robin Hu <huxw@knight.6test.edu.cn> writes:

>>>>> "Lee" == Lee Sau Dan <danlee@informatik.uni-freiburg.de> writes:
>>>>> "Charles" == Charles Muller <acmuller@gol.com> writes:

    Lee> Many people using the CJK parts of Emacs only work with the
    Lee> national encodings (Big5, GB, JIS, KSC, etc.)  and in those
    Lee> cases, they Emacs works excellently.

    Robin>     I think you are over-simpilify this problem. ;-( Most
    Robin> CJK characters are not encoded in either Big5 or GB or JIS
    Robin> or KSC, that's why the GB coding standard change from
    Robin> gb2312 to gbk then to gb18030. 

It depends on what you mean  by "most".  Yes, if you include those 10s
of thousnds  of *rare* characters,  then even Unicode can  fall short.
Most Chinese text, for  instance, uses around 5000 distinct characters
only,  of  which  around  1000  accounts  for more  than  90%  of  the
characters in  a text.   Big5 is very  sufficient for normal  use.  If
not, the Chinese  people won't have thrown it away  (e.g. in favour of
Unicode).   Similarly,  Japanese  texts  employ around  3000  distinct
characters, and there  is a government standard list  of characters to
use.  Characters  outside that  list should be  theoretically avoided.
The characters in JIS are based on this set, AFAIK.


    Robin> AFAIK, most chinese characters also cannot be coded within
    Robin> mule, and exists unicode support does not solve this
    Robin> problem.

As long as 99.99% of the characters that I need for Chinese text files
can be encoded in Big5 and emacs-mule, what's the problem?


    Robin>     Of course, emacs is enough for most people in most
    Robin> time, but I am really hesitated to tell my friend once and
    Robin> once again "Sorry, but your name (羽中) is not supported by
    Robin> my emacs."

No, that's not my name.  I  think Gnus sets the charset of my postings
to big5.

And  which Emacs  is  your emacs?   Emacs  since version  20 has  been
displaying  Chinese  (I can't  speak  for  Japanese  and Korean)  very
satisfactorily.  And  I find  it, together with  Gnus, to be  the most
practical tool on Linux to read/write Chinese files/news/mails.



-- 
Lee Sau Dan                     李守敦(Big5)                    ~{@nJX6X~}(HZ) 

E-mail: danlee@informatik.uni-freiburg.de
Home page: http://www.informatik.uni-freiburg.de/~danlee

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
       [not found]           ` <mailman.6097.1052826249.21513.help-gnu-emacs@gnu.org>
@ 2003-05-15  8:07             ` Lee Sau Dan
  0 siblings, 0 replies; 41+ messages in thread
From: Lee Sau Dan @ 2003-05-15  8:07 UTC (permalink / raw)


>>>>> "Charles" == Charles Muller <acmuller@gol.com> writes:

    Charles> Lee Sau Dan wrote:
    >> And for XML in Chinese, utf-8 wastes lots of space.  To be
    >> practical, we often use big5 for XML files with Chinese.

    Charles> That's fine, if all you are doing is Chinese. The
    Charles> documents in my project include terms from over 15
    Charles> languages, including Tibetan, Nepalese, Sanskrit, Pali,
    Charles> and several European languages. Unicode has codepoints
    Charles> for these characters, while Big5 (and other Chinese
    Charles> codesets) do not.

The  emacs-mule encoding also  has code  points for  these characters.
Moreover, it can distinguish  big5 characters from JIS characters (see
the  "Difference among  chinese  characters in  GB,  JIS, KSC,  BIG5:"
section in  the HELLO file.),  while Unicode (and hence  utf-7, utf-8,
utf-16, ucs2, ucs4) cannot.


-- 
Lee Sau Dan                     李守敦(Big5)                    ~{@nJX6X~}(HZ) 

E-mail: danlee@informatik.uni-freiburg.de
Home page: http://www.informatik.uni-freiburg.de/~danlee

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
       [not found]         ` <mailman.6156.1052882447.21513.help-gnu-emacs@gnu.org>
@ 2003-05-15  8:07           ` Lee Sau Dan
  2003-05-16 11:36             ` Eli Zaretskii
  0 siblings, 1 reply; 41+ messages in thread
From: Lee Sau Dan @ 2003-05-15  8:07 UTC (permalink / raw)


>>>>> "Eli" == Eli Zaretskii <eliz@elta.co.il> writes:

    Eli> Until Emacs supports the full range of Unicode characters,
    Eli> the encoding used now to save etc/HELLO is about _the_only_
    Eli> one that can do the job.  Let me remind you that in the
    Eli> released versions of Emacs, only a subset of the BMP is
    Eli> supported.
    >>  I don't think so.  Unicode will never be able to handle the
    >> "Difference among chinese characters in GB, JIS, KSC, BIG5:"
    >> section in the etc/HELLO file.

    Eli> Unicode doesn't, but the Unicode Emacs will.  Trust me ;-)

So,  you're  agreeing  that  converting  HELLO to  utf-8  (which  only
represents Unicode) is not a good idea?  Or are you resorting to dirty
tricks using the Private Use Area?


-- 
Lee Sau Dan                     李守敦(Big5)                    ~{@nJX6X~}(HZ) 

E-mail: danlee@informatik.uni-freiburg.de
Home page: http://www.informatik.uni-freiburg.de/~danlee

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Chinese characters support
  2003-05-15  8:07           ` Lee Sau Dan
@ 2003-05-16 11:36             ` Eli Zaretskii
  0 siblings, 0 replies; 41+ messages in thread
From: Eli Zaretskii @ 2003-05-16 11:36 UTC (permalink / raw)


> From: Lee Sau Dan <danlee@informatik.uni-freiburg.de>
> Newsgroups: gnu.emacs.help
> Date: 15 May 2003 10:07:01 +0200
> 
>     >>  I don't think so.  Unicode will never be able to handle the
>     >> "Difference among chinese characters in GB, JIS, KSC, BIG5:"
>     >> section in the etc/HELLO file.
> 
>     Eli> Unicode doesn't, but the Unicode Emacs will.  Trust me ;-)
> 
> So,  you're  agreeing  that  converting  HELLO to  utf-8  (which  only
> represents Unicode) is not a good idea?

I don't know yet.  AFAIK, the issue of encoding etc/HELLO in the
Unicode Emacs was not discussed yet, but I expect it to be encoded in
the internal Emacs representation of characters, because that by
definition will support all the characters suppored by Emacs, and do
that unambiguously.

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2003-05-16 11:36 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <mailman.5730.1052348993.21513.help-gnu-emacs@gnu.org>
2003-05-10 14:26 ` Chinese characters support Kai Großjohann
2003-05-10 16:17   ` Charles Muller
2003-05-10 16:45     ` Kai Großjohann
2003-05-10 17:31       ` Charles Muller
2003-05-10 18:43         ` Eli Zaretskii
2003-05-11  2:11           ` Charles Muller
2003-05-11  3:32             ` Eli Zaretskii
2003-05-11 13:59               ` Charles Muller
     [not found]               ` <mailman.5976.1052661651.21513.help-gnu-emacs@gnu.org>
2003-05-12 19:29                 ` Jason Rumney
2003-05-12 19:58                 ` Kai Großjohann
2003-05-13  7:40                 ` Lee Sau Dan
2003-05-13  9:57                   ` acmuller
2003-05-13 10:02                   ` Robin Hu
2003-05-15  8:07                     ` Lee Sau Dan
2003-05-10 19:24         ` Kai Großjohann
2003-05-11  2:15           ` Charles Muller
2003-05-11  3:34             ` Eli Zaretskii
     [not found]           ` <mailman.5956.1052619415.21513.help-gnu-emacs@gnu.org>
2003-05-12 19:56             ` Kai Großjohann
2003-05-13  3:36               ` Charles Muller
2003-05-14  3:14                 ` Eli Zaretskii
     [not found]               ` <mailman.6084.1052797097.21513.help-gnu-emacs@gnu.org>
2003-05-13  7:05                 ` Kai Großjohann
2003-05-14  6:14                 ` Lee Sau Dan
2003-05-14 16:27                   ` Kai Großjohann
2003-05-14 21:07                     ` Jason Rumney
     [not found]       ` <mailman.5927.1052587973.21513.help-gnu-emacs@gnu.org>
2003-05-12 19:27         ` Jason Rumney
2003-05-13  7:40         ` Lee Sau Dan
2003-05-13 10:11           ` acmuller
2003-05-13 10:54           ` Charles Muller
     [not found]           ` <mailman.6097.1052826249.21513.help-gnu-emacs@gnu.org>
2003-05-15  8:07             ` Lee Sau Dan
2003-05-10 17:58     ` Eli Zaretskii
     [not found]     ` <mailman.5936.1052589798.21513.help-gnu-emacs@gnu.org>
2003-05-13  7:40       ` Lee Sau Dan
2003-05-14  3:15         ` Eli Zaretskii
     [not found]         ` <mailman.6156.1052882447.21513.help-gnu-emacs@gnu.org>
2003-05-15  8:07           ` Lee Sau Dan
2003-05-16 11:36             ` Eli Zaretskii
2003-05-12 23:05   ` Michael Na Li
2003-05-13  7:02     ` Kai Großjohann
     [not found]   ` <mailman.5922.1052583563.21513.help-gnu-emacs@gnu.org>
2003-05-13  7:40     ` Lee Sau Dan
2003-05-07 23:08 Gaoyan Xie
2003-05-08  6:27 ` Charles Muller
     [not found] ` <mailman.5739.1052375326.21513.help-gnu-emacs@gnu.org>
2003-05-08  7:33   ` Robin Hu
2003-05-10 14:28   ` Kai Großjohann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).