* Chinese characters support
@ 2003-05-07 23:08 Gaoyan Xie
2003-05-08 6:27 ` Charles Muller
[not found] ` <mailman.5739.1052375326.21513.help-gnu-emacs@gnu.org>
0 siblings, 2 replies; 41+ messages in thread
From: Gaoyan Xie @ 2003-05-07 23:08 UTC (permalink / raw)
Hi all,
I am trying to explore GNU emacs's multilingual support, and what I want
is the display and input of Chinese characters. Have any of you done
this before? I tried according to GNU emacs' online manual, but still
couldn't make it work. BTW, I am using Redhat Linux 7.2 and GNU emacs 20.7.
Thanks for any help for this issue.
Gaoyan Xie
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
2003-05-07 23:08 Gaoyan Xie
@ 2003-05-08 6:27 ` Charles Muller
[not found] ` <mailman.5739.1052375326.21513.help-gnu-emacs@gnu.org>
1 sibling, 0 replies; 41+ messages in thread
From: Charles Muller @ 2003-05-08 6:27 UTC (permalink / raw)
Cc: help-gnu-emacs
Gaoyan Xie wrote
> I am trying to explore GNU emacs's multilingual support, and what I want
> is the display and input of Chinese characters. Have any of you done
> this before? I tried according to GNU emacs' online manual, but still
> couldn't make it work. BTW, I am using Redhat Linux 7.2 and GNU emacs
> 20.7.
I would recommend first that you consider installing a newer 21.x version of Emacs
if you are concerned about international script support. A newer version of
RedHat would not hurt either. I am using RH9 with Emacs 21.2 and Chinese and
Japanese display without me having to do anything, as long as the documents
are encoded in JIS for Japanese and Big5 for Chinese.
When it comes to working with UTF-8, I have never heard of anyone succeeding in
displaying East Asian scripts without installing the TEI-Emacs add-on.
Chuck
---------------------------
Charles Muller <acmuller@gol.com>
Faculty of Humanities, Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary [http://www.acmuller.net]
H-Buddhism List Editor [http://www2.h-net.msu.edu/~buddhism/]
Mobile Phone: 090-9310-1787
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
[not found] ` <mailman.5739.1052375326.21513.help-gnu-emacs@gnu.org>
@ 2003-05-08 7:33 ` Robin Hu
2003-05-10 14:28 ` Kai Großjohann
1 sibling, 0 replies; 41+ messages in thread
From: Robin Hu @ 2003-05-08 7:33 UTC (permalink / raw)
>>>>> "Charles" == Charles Muller <acmuller@gol.com> writes:
Charles> Gaoyan Xie wrote
>> I am trying to explore GNU emacs's multilingual support, and what
>> I want is the display and input of Chinese characters. Have any
>> of you done this before? I tried according to GNU emacs' online
>> manual, but still couldn't make it work. BTW, I am using Redhat
>> Linux 7.2 and GNU emacs 20.7.
Charles> I would recommend first that you consider installing a
Charles> newer 21.x version of Emacs if you are concerned about
Charles> international script support. A newer version of RedHat
Charles> would not hurt either. I am using RH9 with Emacs 21.2 and
Charles> Chinese and Japanese display without me having to do
Charles> anything, as long as the documents are encoded in JIS for
Charles> Japanese and Big5 for Chinese.
Charles> When it comes to working with UTF-8, I have never heard of
Charles> anyone succeeding in displaying East Asian scripts without
Charles> installing the TEI-Emacs add-on.
I am using Mule-Ucs 0.84 (patches from debian applied) with Emacs
21.3.50, it seems to work fine. So what is "TEI-Emacs add-on", can
you point me out URLs related?
Charles> Chuck
--
The goal of science is to build better mousetraps. The goal of nature
is to build better mice.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
[not found] <mailman.5730.1052348993.21513.help-gnu-emacs@gnu.org>
@ 2003-05-10 14:26 ` Kai Großjohann
2003-05-10 16:17 ` Charles Muller
` (2 more replies)
0 siblings, 3 replies; 41+ messages in thread
From: Kai Großjohann @ 2003-05-10 14:26 UTC (permalink / raw)
Gaoyan Xie <gxie@eecs.wsu.edu> writes:
> I am trying to explore GNU emacs's multilingual support, and what I
> want is the display and input of Chinese characters. Have any of you
> done this before? I tried according to GNU emacs' online manual, but
> still couldn't make it work. BTW, I am using Redhat Linux 7.2 and GNU
> emacs 20.7.
I don't know anything about Chinese support in general. But with
Emacs, it was very easy.
I compiled and installed Emacs and I also installed some Chinese
fonts. (The GNU intlfonts package, available from ftp.gnu.org, is a
good starting point.)
Then I typed M-x view-hello-file RET. This showed me some Chinese
(and Japanese, and Korean) characters. If you see empty boxes
instead of the Chinese characters, then some fonts are missing.
Then I typed C-\ chinese-py RET to select a Pinyin input method.
Then I typed nihao and saw two Chinese characters.
--
file-error; Data: (Opening input file no such file or directory ~/.signature)
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
[not found] ` <mailman.5739.1052375326.21513.help-gnu-emacs@gnu.org>
2003-05-08 7:33 ` Robin Hu
@ 2003-05-10 14:28 ` Kai Großjohann
1 sibling, 0 replies; 41+ messages in thread
From: Kai Großjohann @ 2003-05-10 14:28 UTC (permalink / raw)
Charles Muller <acmuller@gol.com> writes:
> When it comes to working with UTF-8, I have never heard of anyone
> succeeding in displaying East Asian scripts without installing the
> TEI-Emacs add-on.
The CVS version of Emacs has utf-translate-cjk-mode which allows me
to do this:
C-x C-x /some/nonexisting/file/name RET
C-u C-\ chinese-py RET
nihao (enter Chinese here)
C-x RET c utf-8 RET
C-x C-s
After this, I get a UTF-8 encoded file with Chinese characters in it.
utf-translate-cjk-mode used to be called utf-translate-cjk. I don't
know when it appeared in Emacs. Probably it isn't in 21.3.
--
file-error; Data: (Opening input file no such file or directory ~/.signature)
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
2003-05-10 14:26 ` Chinese characters support Kai Großjohann
@ 2003-05-10 16:17 ` Charles Muller
2003-05-10 16:45 ` Kai Großjohann
` (2 more replies)
2003-05-12 23:05 ` Michael Na Li
[not found] ` <mailman.5922.1052583563.21513.help-gnu-emacs@gnu.org>
2 siblings, 3 replies; 41+ messages in thread
From: Charles Muller @ 2003-05-10 16:17 UTC (permalink / raw)
Cc: help-gnu-emacs
Kai wrote:
> Then I typed M-x view-hello-file RET. This showed me some Chinese
> (and Japanese, and Korean) characters. If you see empty boxes
> instead of the Chinese characters, then some fonts are missing.
I should be pointed out, nonetheless, that it is a bad idea to
cite the hello file as an example of international script functionality,
since it is set in an encoding that virtually no one ever uses (at least in
the CJK world), and it is quite often the case that that file will display
fine despite the fact that CJK won't work in utf-8 or native East Asian
encodings. Someone should either get rid of that file or save it in a
relevant encoding.
Chuck
---------------------------
Charles Muller <acmuller@gol.com>
Faculty of Humanities, Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary [http://www.acmuller.net]
H-Buddhism List Editor [http://www2.h-net.msu.edu/~buddhism/]
Mobile Phone: 090-9310-1787
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
2003-05-10 16:17 ` Charles Muller
@ 2003-05-10 16:45 ` Kai Großjohann
2003-05-10 17:31 ` Charles Muller
[not found] ` <mailman.5927.1052587973.21513.help-gnu-emacs@gnu.org>
2003-05-10 17:58 ` Eli Zaretskii
[not found] ` <mailman.5936.1052589798.21513.help-gnu-emacs@gnu.org>
2 siblings, 2 replies; 41+ messages in thread
From: Kai Großjohann @ 2003-05-10 16:45 UTC (permalink / raw)
Cc: help-gnu-emacs
Charles Muller <acmuller@gol.com> writes:
> Kai wrote:
>
>> Then I typed M-x view-hello-file RET. This showed me some Chinese
>> (and Japanese, and Korean) characters. If you see empty boxes
>> instead of the Chinese characters, then some fonts are missing.
>
> I should be pointed out, nonetheless, that it is a bad idea to cite
> the hello file as an example of international script functionality,
> since it is set in an encoding that virtually no one ever uses (at
> least in the CJK world),
Really? The HELLO file shows characters from a lot of different
encodings, and if used as such, then it is quite useful.
> and it is quite often the case that that file will display fine
> despite the fact that CJK won't work in utf-8
There are known problems with CJK support in UTF-8, but the situation
has improved greatly in the development version of Emacs.
> or native East Asian encodings.
Can you cite examples? I have had no problem with gb2312 and Chinese
characters, at least. Others routinely use Shift-JIS and EUC-JP for
Japanese, I gather.
> Someone should either get rid of that file or save it in a relevant
> encoding.
The file is in a relevant encoding: it's the encoding used by Emacs
internally. (Or rather, an encoding close to the internal encoding.)
This fact has its disadvantages, but it also has advantages.
--
file-error; Data: (Opening input file no such file or directory ~/.signature)
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
2003-05-10 16:45 ` Kai Großjohann
@ 2003-05-10 17:31 ` Charles Muller
2003-05-10 18:43 ` Eli Zaretskii
2003-05-10 19:24 ` Kai Großjohann
[not found] ` <mailman.5927.1052587973.21513.help-gnu-emacs@gnu.org>
1 sibling, 2 replies; 41+ messages in thread
From: Charles Muller @ 2003-05-10 17:31 UTC (permalink / raw)
Cc: help-gnu-emacs
Kai wrote:
> Really? The HELLO file shows characters from a lot of different
> encodings, and if used as such, then it is quite useful.
>
> > and it is quite often the case that that file will display fine
> > despite the fact that CJK won't work in utf-8
>
> There are known problems with CJK support in UTF-8, but the situation
> has improved greatly in the development version of Emacs.
I know that, and I am not contesting that point. But again, the HELLO file
is not a utf-8 file. It is also not a form of JIS or other East Asian
encoding, so the fact that one can display multilingual scripts by opening
that file does not mean that they will be able to display them in Big5, JIS,
or whatever. If you check the archives for "utf-8+cjk", you will see that we have had a few
threads in the past year that dealt with problems trying to display CJK and
other international scripts, in which the advice was given to look at the
Hello file. As a person who has been working with international scripts and
utf-8 for a number years, I know firsthand the ability to be able to read
this file doesn't usually mean much. People who recommend checking this file
are usually people who don't use double-byte East Asian languages.
> The file is in a relevant encoding: it's the encoding used by Emacs
> internally. (Or rather, an encoding close to the internal encoding.)
Relevant to whom? It's not in utf-8, right? Most of the problems people have
been having with CJK display in Emacs (at least until the appearance of
21.3.5) have to do with problems getting utf-8 to work, and the hello file
will still display even when these problems are not resolved.
No one that I know who works in XML or with East Asian international scripts
works in utf-7, so while that encoding format may be relevant for those who
are programming Emacs internally, it is not relevant for anyone using Emacs
to do multilingual XML or HTML publication, because no one uses it. That's
what I mean when I say "not relevant."
It is not my purpose to badmouth Emacs handling of Unicode. I know that
people have been working very hard to resolve these problems, and from what
I have been hearing, once everyone has copies of 21.3.5 installed with the
right Mule setup, this will be a past issue. Hopefully, somewhere along the
line, the Hello file will also graduate to utf-8.
Regards,
Chuck
---------------------------
Charles Muller <acmuller@gol.com>
Faculty of Humanities, Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary [http://www.acmuller.net]
H-Buddhism List Editor [http://www2.h-net.msu.edu/~buddhism/]
Mobile Phone: 090-9310-1787
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
2003-05-10 16:17 ` Charles Muller
2003-05-10 16:45 ` Kai Großjohann
@ 2003-05-10 17:58 ` Eli Zaretskii
[not found] ` <mailman.5936.1052589798.21513.help-gnu-emacs@gnu.org>
2 siblings, 0 replies; 41+ messages in thread
From: Eli Zaretskii @ 2003-05-10 17:58 UTC (permalink / raw)
> Date: Sun, 11 May 2003 01:17:25 +0900 (JST)
> Newsgroups: gnu.emacs.help
> From: Charles Muller <acmuller@gol.com>
>
> I should be pointed out, nonetheless, that it is a bad idea to
> cite the hello file as an example of international script functionality,
> since it is set in an encoding that virtually no one ever uses (at least in
> the CJK world)
It is certainly useful to see whether Emacs is set up correctly for
its non-ASCII support, including coding systems, fonts, and other
facilities. Whether other software understands the way that file was
encoded is irrelevant for this.
> Someone should either get rid of that file or save it in a
> relevant encoding.
Until Emacs supports the full range of Unicode characters, the
encoding used now to save etc/HELLO is about _the_only_ one that can
do the job. Let me remind you that in the released versions of Emacs,
only a subset of the BMP is supported.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
2003-05-10 17:31 ` Charles Muller
@ 2003-05-10 18:43 ` Eli Zaretskii
2003-05-11 2:11 ` Charles Muller
2003-05-10 19:24 ` Kai Großjohann
1 sibling, 1 reply; 41+ messages in thread
From: Eli Zaretskii @ 2003-05-10 18:43 UTC (permalink / raw)
> Date: Sun, 11 May 2003 02:31:49 +0900 (JST)
> From: Charles Muller <acmuller@gol.com>
>
> the HELLO file
> is not a utf-8 file. It is also not a form of JIS or other East Asian
> encoding, so the fact that one can display multilingual scripts by opening
> that file does not mean that they will be able to display them in Big5, JIS,
> or whatever.
It does demonstrate that Emacs can display, read, and write Chinese
characters, Japanese characters, and other characters. UTF-8 is not
the only way to dio that, and there's lots of other non-trivial
machinery, bot inside Emacs and outside it, that should be set up
correctly for it to be able to display etc/HELLO, even without UTF-8.
> If you check the archives for "utf-8+cjk", you will see that we have had a few
> threads in the past year that dealt with problems trying to display CJK and
> other international scripts, in which the advice was given to look at the
> Hello file.
If you read the archives of this forum (and of gnu.emacs.bug), you
will see that it's been recommended _a_lot_.
> It is not my purpose to badmouth Emacs handling of Unicode.
However, you've actually done precisely that.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
2003-05-10 17:31 ` Charles Muller
2003-05-10 18:43 ` Eli Zaretskii
@ 2003-05-10 19:24 ` Kai Großjohann
2003-05-11 2:15 ` Charles Muller
[not found] ` <mailman.5956.1052619415.21513.help-gnu-emacs@gnu.org>
1 sibling, 2 replies; 41+ messages in thread
From: Kai Großjohann @ 2003-05-10 19:24 UTC (permalink / raw)
Cc: help-gnu-emacs
Charles Muller <acmuller@gol.com> writes:
> Kai wrote:
>
>> Really? The HELLO file shows characters from a lot of different
>> encodings, and if used as such, then it is quite useful.
>>
>> > and it is quite often the case that that file will display fine
>> > despite the fact that CJK won't work in utf-8
>>
>> There are known problems with CJK support in UTF-8, but the situation
>> has improved greatly in the development version of Emacs.
>
> I know that, and I am not contesting that point. But again, the
> HELLO file is not a utf-8 file. It is also not a form of JIS or
> other East Asian encoding, so the fact that one can display
> multilingual scripts by opening that file does not mean that they
> will be able to display them in Big5, JIS, or whatever. If you check
> the archives for "utf-8+cjk", you will see that we have had a few
> threads in the past year that dealt with problems trying to display
> CJK and other international scripts, in which the advice was given
> to look at the Hello file. As a person who has been working with
> international scripts and utf-8 for a number years, I know firsthand
> the ability to be able to read this file doesn't usually mean
> much. People who recommend checking this file are usually people who
> don't use double-byte East Asian languages.
I know that I often suggest people to have a look at the HELLO file.
I do that to answer one question: does Emacs find the right fonts?
While a correctly-looking HELLO file is not sufficient for correct
functioning of CJK, it is a requirement. That is, if HELLO looks
bad, that needs to be fixed first.
However, I don't use CJK myself (much), and therefore I am quite
ready to be proven wrong. Is it common to have a garbled HELLO file
but still CJK is working right?
--
This line is not blank.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
2003-05-10 18:43 ` Eli Zaretskii
@ 2003-05-11 2:11 ` Charles Muller
2003-05-11 3:32 ` Eli Zaretskii
0 siblings, 1 reply; 41+ messages in thread
From: Charles Muller @ 2003-05-11 2:11 UTC (permalink / raw)
Eli wrote:
> If you read the archives of this forum (and of gnu.emacs.bug), you
> will see that it's been recommended _a_lot_.
>
> > It is not my purpose to badmouth Emacs handling of Unicode.
>
> However, you've actually done precisely that.
All I'm trying to do, as a person who does a lot of work with CJK and utf-8
is point out something that can be done a little better. I can't understand
why you guys can't simply say "hmm.. perhaps this is something we should
look into." What's the big deal?
Chuck
---------------------------
Charles Muller <acmuller@gol.com>
Faculty of Humanities, Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary [http://www.acmuller.net]
H-Buddhism List Editor [http://www2.h-net.msu.edu/~buddhism/]
Mobile Phone: 090-9310-1787
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
2003-05-10 19:24 ` Kai Großjohann
@ 2003-05-11 2:15 ` Charles Muller
2003-05-11 3:34 ` Eli Zaretskii
[not found] ` <mailman.5956.1052619415.21513.help-gnu-emacs@gnu.org>
1 sibling, 1 reply; 41+ messages in thread
From: Charles Muller @ 2003-05-11 2:15 UTC (permalink / raw)
Kai wrote:
> However, I don't use CJK myself (much), and therefore I am quite
> ready to be proven wrong. Is it common to have a garbled HELLO file
> but still CJK is working right?
No, it is exactly the opposite. It is common for one to be able to display
the HELLO file without problems, but to still have difficulty displaying CJK
in other encodings, especially UTF-8.
Chuck
---------------------------
Charles Muller <acmuller@gol.com>
Faculty of Humanities, Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary [http://www.acmuller.net]
H-Buddhism List Editor [http://www2.h-net.msu.edu/~buddhism/]
Mobile Phone: 090-9310-1787
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
2003-05-11 2:11 ` Charles Muller
@ 2003-05-11 3:32 ` Eli Zaretskii
2003-05-11 13:59 ` Charles Muller
[not found] ` <mailman.5976.1052661651.21513.help-gnu-emacs@gnu.org>
0 siblings, 2 replies; 41+ messages in thread
From: Eli Zaretskii @ 2003-05-11 3:32 UTC (permalink / raw)
Cc: help-gnu-emacs
> Date: Sun, 11 May 2003 11:11:48 +0900 (JST)
> From: Charles Muller <acmuller@gol.com>
>
> Eli wrote:
>
> > If you read the archives of this forum (and of gnu.emacs.bug), you
> > will see that it's been recommended _a_lot_.
> >
> > > It is not my purpose to badmouth Emacs handling of Unicode.
> >
> > However, you've actually done precisely that.
>
> All I'm trying to do, as a person who does a lot of work with CJK and utf-8
> is point out something that can be done a little better. I can't understand
> why you guys can't simply say "hmm.. perhaps this is something we should
> look into." What's the big deal?
Is anything but complete acceptance of your opinions going to convince
you that we know what we are talking about?
Look, all I was trying to do, as a person who did some work in this
area for Emacs, and as someone who does get to answer lots of
questions about this, is to tell you that etc/HELLO _is_ useful, and
that the last thing I'd expect Emacs maintainers to do is to remove
it. Why cannot you accept that?
It goes without saying that in a Unicode Emacs, the one that will have
its internal representation of characters based on Unicode codepoints,
HELLO will be recoded, either in that internal representation or in
UTF-8. But until that happens, I don't see any reason to recode it.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
2003-05-11 2:15 ` Charles Muller
@ 2003-05-11 3:34 ` Eli Zaretskii
0 siblings, 0 replies; 41+ messages in thread
From: Eli Zaretskii @ 2003-05-11 3:34 UTC (permalink / raw)
> Date: Sun, 11 May 2003 11:15:32 +0900 (JST)
> From: Charles Muller <acmuller@gol.com>
>
> No, it is exactly the opposite. It is common for one to be able to display
> the HELLO file without problems, but to still have difficulty displaying CJK
> in other encodings, especially UTF-8.
So etc/HELLO is not the ultimate solution for testing how well CJK is
handled. So what? It certainly helps to test certain aspects of
that. And for a file that requires near-zero maintenance, that's a
lot.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
2003-05-11 3:32 ` Eli Zaretskii
@ 2003-05-11 13:59 ` Charles Muller
[not found] ` <mailman.5976.1052661651.21513.help-gnu-emacs@gnu.org>
1 sibling, 0 replies; 41+ messages in thread
From: Charles Muller @ 2003-05-11 13:59 UTC (permalink / raw)
Cc: help-gnu-emacs
Eli wrote:
> Is anything but complete acceptance of your opinions going to convince
> you that we know what we are talking about?
It seems like this discussion has expanded out of proportion.
> Look, all I was trying to do, as a person who did some work in this
> area for Emacs, and as someone who does get to answer lots of
> questions about this, is to tell you that etc/HELLO _is_ useful, and
> that the last thing I'd expect Emacs maintainers to do is to remove
> it. Why cannot you accept that?
I never requested the removal of the HELLO file. All I said was that as long
as it was maintained in utf-7, it was not especially useful as test file for
people who are trying to get their CJK working right.
> It goes without saying that in a Unicode Emacs, the one that will have
> its internal representation of characters based on Unicode codepoints,
> HELLO will be recoded, either in that internal representation or in
> UTF-8.
This seems like a good approach. I appreciate your efforts to
understand the issue.
>But until that happens, I don't see any reason to recode it.
Whatever. Hopefully, as Emacs 21.3-5 with the appropriate Mule settings is
widely proliferated, this will be a moot issue. Emacs is a superb piece of
work, and I appreciate the efforts of all its developers to continually
expand its versatility.
Chuck
---------------------------
Charles Muller <acmuller@gol.com>
Faculty of Humanities, Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary [http://www.acmuller.net]
H-Buddhism List Editor [http://www2.h-net.msu.edu/~buddhism/]
Mobile Phone: 090-9310-1787
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
[not found] ` <mailman.5927.1052587973.21513.help-gnu-emacs@gnu.org>
@ 2003-05-12 19:27 ` Jason Rumney
2003-05-13 7:40 ` Lee Sau Dan
1 sibling, 0 replies; 41+ messages in thread
From: Jason Rumney @ 2003-05-12 19:27 UTC (permalink / raw)
Charles Muller <acmuller@gol.com> writes:
> If you check the archives for "utf-8+cjk", you will see that we have
> had a few threads in the past year that dealt with problems trying
> to display CJK and other international scripts, in which the advice
> was given to look at the Hello file. As a person who has been
> working with international scripts and utf-8 for a number years, I
> know firsthand the ability to be able to read this file doesn't
> usually mean much.
If HELLO displays correctly, it means that Emacs has all it need in
order to display characters, and the problem lies in the
encoding/decoding process. If we recoded HELLO into UTF-8, then if
people were having problems displaying utf-8 encoded text, looking at
HELLO would just not work. At least now, it is a way to quickly
narrow down the problem.
> People who recommend checking this file are usually people who don't
> use double-byte East Asian languages.
No, they are people that know how Emacs works, and are trying to help
by narrowing down the problem.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
[not found] ` <mailman.5976.1052661651.21513.help-gnu-emacs@gnu.org>
@ 2003-05-12 19:29 ` Jason Rumney
2003-05-12 19:58 ` Kai Großjohann
2003-05-13 7:40 ` Lee Sau Dan
2 siblings, 0 replies; 41+ messages in thread
From: Jason Rumney @ 2003-05-12 19:29 UTC (permalink / raw)
Charles Muller <acmuller@gol.com> writes:
> I never requested the removal of the HELLO file. All I said was that as long
> as it was maintained in utf-7,
It has never been maintained in utf-7, and no suggestion has ever
been made to encode it in utf-7. The current encoding is based on
iso-2022 AFAIK.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
[not found] ` <mailman.5956.1052619415.21513.help-gnu-emacs@gnu.org>
@ 2003-05-12 19:56 ` Kai Großjohann
2003-05-13 3:36 ` Charles Muller
[not found] ` <mailman.6084.1052797097.21513.help-gnu-emacs@gnu.org>
0 siblings, 2 replies; 41+ messages in thread
From: Kai Großjohann @ 2003-05-12 19:56 UTC (permalink / raw)
Charles Muller <acmuller@gol.com> writes:
> Kai wrote:
>
>> However, I don't use CJK myself (much), and therefore I am quite
>> ready to be proven wrong. Is it common to have a garbled HELLO file
>> but still CJK is working right?
>
> No, it is exactly the opposite. It is common for one to be able to
> display the HELLO file without problems, but to still have
> difficulty displaying CJK in other encodings, especially UTF-8.
Well, that's good, then. The HELLO file does exactly what I need: it
helps me to narrow down the area where people are having problems.
If HELLO displays fine, then the problem is not the fonts.
Maybe HELLO does not do what you need, but that doesn't make it
useless in general, IMVHO.
--
This line is not blank.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
[not found] ` <mailman.5976.1052661651.21513.help-gnu-emacs@gnu.org>
2003-05-12 19:29 ` Jason Rumney
@ 2003-05-12 19:58 ` Kai Großjohann
2003-05-13 7:40 ` Lee Sau Dan
2 siblings, 0 replies; 41+ messages in thread
From: Kai Großjohann @ 2003-05-12 19:58 UTC (permalink / raw)
Charles Muller <acmuller@gol.com> writes:
> I never requested the removal of the HELLO file. All I said was that as long
> as it was maintained in utf-7,
(It's not in utf-7. Though it's not relevant here.)
> it was not especially useful as test file for people who are trying
> to get their CJK working right.
It's *very* useful. It halves the problem space.
--
This line is not blank.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
2003-05-10 14:26 ` Chinese characters support Kai Großjohann
2003-05-10 16:17 ` Charles Muller
@ 2003-05-12 23:05 ` Michael Na Li
2003-05-13 7:02 ` Kai Großjohann
[not found] ` <mailman.5922.1052583563.21513.help-gnu-emacs@gnu.org>
2 siblings, 1 reply; 41+ messages in thread
From: Michael Na Li @ 2003-05-12 23:05 UTC (permalink / raw)
On 10 May 2003, Kai Großjohann spake thusly:
> Gaoyan Xie <gxie@eecs.wsu.edu> writes:
>
> > I am trying to explore GNU emacs's multilingual support, and what I
> > want is the display and input of Chinese characters. Have any of you
> > done this before? I tried according to GNU emacs' online manual, but
> > still couldn't make it work. BTW, I am using Redhat Linux 7.2 and GNU
> > emacs 20.7.
>
> I don't know anything about Chinese support in general. But with
> Emacs, it was very easy.
>
> I compiled and installed Emacs and I also installed some Chinese
> fonts. (The GNU intlfonts package, available from ftp.gnu.org, is a
> good starting point.)
>
> Then I typed M-x view-hello-file RET. This showed me some Chinese
> (and Japanese, and Korean) characters. If you see empty boxes
> instead of the Chinese characters, then some fonts are missing.
>
> Then I typed C-\ chinese-py RET to select a Pinyin input method.
Don't you need M-x set-language-environment RET Chinese-GB RET such that the
file is saved in gb2312 coding?
The chinese-py-punct method also provides ways to input Chinese style
punctuations.
Michael
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
2003-05-12 19:56 ` Kai Großjohann
@ 2003-05-13 3:36 ` Charles Muller
2003-05-14 3:14 ` Eli Zaretskii
[not found] ` <mailman.6084.1052797097.21513.help-gnu-emacs@gnu.org>
1 sibling, 1 reply; 41+ messages in thread
From: Charles Muller @ 2003-05-13 3:36 UTC (permalink / raw)
Cc: help-gnu-emacs
Kai wrote:
> Maybe HELLO does not do what you need, but that doesn't make it
> useless in general, IMVHO.
I never said it was useless in general, and have never
suggested that the HELLO file should be relegated to oblivion.
One more time:
Since the HELLO file is used for internal testing by Emacs coders it almost
always works correctly in any recent Emacs "out of the box."
The common misunderstanding occurs when people who are trying to get
CJK working in utf-8 write to this, or another list for help, and list
members, in the spirit of trying to be helpful, suggest that all is fine if
the HELLO file displays right.
But since the HELLO file is encoded in iso-2022 (not utf-7, as I originally
stated) it is the case, in my fairly extensive experience with the matter,
that the HELLO file will almost invariably display fine, while the original
problem (usually Mule-related) remains untouched upon.
Since the people who usually make the suggestion to test via the HELLO are
those who do not regularly use CJK, it seems that they are not aware of this
discrepancy, and I wanted to point this out.
It seems strange to see people react so emotionally to the exposure
of this simple point. No one is asking that the hallowed HELLO file be sent
to oblivion--although a reincarnation as utf-8 would certainly not hurt! :-)
Chuck
---------------------------
Charles Muller <acmuller@gol.com>
Faculty of Humanities, Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary [http://www.acmuller.net]
H-Buddhism List Editor [http://www2.h-net.msu.edu/~buddhism/]
Mobile Phone: 090-9310-1787
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
2003-05-12 23:05 ` Michael Na Li
@ 2003-05-13 7:02 ` Kai Großjohann
0 siblings, 0 replies; 41+ messages in thread
From: Kai Großjohann @ 2003-05-13 7:02 UTC (permalink / raw)
Michael Na Li <lina@u.washington.edu> writes:
> Don't you need M-x set-language-environment RET Chinese-GB RET such that the
> file is saved in gb2312 coding?
It seems to offer gb2312 by default. That's because the characters
in the buffer are gb2312.
But if I was using Chinese all the time, I'd surely set my language
environment to Chinese-GB. However, right now LC_CTYPE is set to
de_DE@euro...
--
This line is not blank.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
[not found] ` <mailman.6084.1052797097.21513.help-gnu-emacs@gnu.org>
@ 2003-05-13 7:05 ` Kai Großjohann
2003-05-14 6:14 ` Lee Sau Dan
1 sibling, 0 replies; 41+ messages in thread
From: Kai Großjohann @ 2003-05-13 7:05 UTC (permalink / raw)
Charles Muller <acmuller@gol.com> writes:
> Since the HELLO file is used for internal testing by Emacs coders it almost
> always works correctly in any recent Emacs "out of the box."
Ah, I see. Actually, some people see empty boxes when the display
HELLO. But you're right, usually it Just Works.
And you're also right in that something more is needed to test
Unicode support in Emacs. It seems that installing Mule-UCS on Emacs
20 also makes it Just Work, more or less. I've had less of a success
with installing Mule-UCS on Emacs 21 -- there was some double-UTF-8
encoding in messages written with Gnus.
--
This line is not blank.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
[not found] ` <mailman.5927.1052587973.21513.help-gnu-emacs@gnu.org>
2003-05-12 19:27 ` Jason Rumney
@ 2003-05-13 7:40 ` Lee Sau Dan
2003-05-13 10:11 ` acmuller
` (2 more replies)
1 sibling, 3 replies; 41+ messages in thread
From: Lee Sau Dan @ 2003-05-13 7:40 UTC (permalink / raw)
>>>>> "Charles" == Charles Muller <acmuller@gol.com> writes:
Charles> I know that, and I am not contesting that point. But
Charles> again, the HELLO file is not a utf-8 file.
I think you're being religious. Why must it be utf-8?
Charles> It is also not a form of JIS or other East Asian
Charles> encoding,
It's emacs-mule encoding --- Emac's own representation of the
information about characters/encodings that it keeps.
Charles>so the fact that one can display multilingual
Charles> scripts by opening that file does not mean that they will
Charles> be able to display them in Big5, JIS, or whatever.
If one can see the Big5 text in that file, he can see all other Big5
files. If one can see the Thai characters in that file, he can also
see the Thai characters when he opens a Thai text file with the
suitable encoding (the default if he has done
set-language-environement correctly). And so on.
Charles> People who recommend checking this file are usually
Charles> people who don't use double-byte East Asian languages.
Sorry, I use Big5 very often. And I do recommend C-h h as a quick
test to see if he has installed the big5 fonts correctly. (Big5 fonts
do not come with XFree86, and many Linux distros has been ignoring the
"leim" and "intlfont" packages for years.)
>> The file is in a relevant encoding: it's the encoding used by
>> Emacs internally. (Or rather, an encoding close to the
>> internal encoding.)
Charles> Relevant to whom?
To Emacs.
Charles> It's not in utf-8, right?
So what? My .signature is in Big5 and it is not in utf-8, either.
And my .emacs file is in emacs-mule encoding, which is not utf-8,
either. Neither are utf-16 files utf-8.
I think you're being religious when you worship utf-8. For Chinese
text, utf-8 wastes 50% of storage space. I'd rather use utf-16. But
big5 has the same storage efficiency (and more when you include some
English text) and it is more common.
Charles> No one that I know who works in XML or with East Asian
Charles> international scripts works in utf-7,
And for XML in Chinese, utf-8 wastes lots of space. To be practical,
we often use big5 for XML files with Chinese.
Charles> so while that encoding format may be relevant for those
Charles> who are programming Emacs internally, it is not relevant
Charles> for anyone using Emacs to do multilingual XML or HTML
Charles> publication, because no one uses it. That's what I mean
Charles> when I say "not relevant."
My experience with Emac's utf-8 <--> internal conversion has been
good.
--
Lee Sau Dan 李守敦(Big5) ~{@nJX6X~}(HZ)
E-mail: danlee@informatik.uni-freiburg.de
Home page: http://www.informatik.uni-freiburg.de/~danlee
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
[not found] ` <mailman.5922.1052583563.21513.help-gnu-emacs@gnu.org>
@ 2003-05-13 7:40 ` Lee Sau Dan
0 siblings, 0 replies; 41+ messages in thread
From: Lee Sau Dan @ 2003-05-13 7:40 UTC (permalink / raw)
>>>>> "Charles" == Charles Muller <acmuller@gol.com> writes:
Charles> Kai wrote:
>> Then I typed M-x view-hello-file RET. This showed me some
>> Chinese (and Japanese, and Korean) characters. If you see
>> empty boxes instead of the Chinese characters, then some fonts
>> are missing.
Charles> I should be pointed out, nonetheless, that it is a bad
Charles> idea to cite the hello file as an example of
Charles> international script functionality,
Why not? That file really illustrates the international script
functionality.
Charles> since it is set in an encoding that virtually no one ever
Charles> uses (at least in the CJK world),
That's a problem with encoding, not Emacs's international script
functionality. Maybe, you have "conformance to Unicode and national
encodings" in mind when you said "international script functionality".
They're different issues.
Charles> and it is quite often the case that that file will
Charles> display fine despite the fact that CJK won't work in
Charles> utf-8 or native East Asian encodings.
C-x RET c utf-8 C-x s ... does save my Chinese text files in UTF-8.
C-x RET c big5 C-x s ... does save my Chinese text files in BIG5 --
the "native" encoding for traditional Chinese.
And needless to say, I can read files in UTF-8 and big5 using C-x RET
c ... C-x C-f. (For Emacs 20, I need to install an external package
for Unicode encodings: MuleUCS or something like that.)
Charles> Someone should either get rid of that file or save it in
Charles> a relevant encoding.
Since no "native" encoding preserves the details that the emacs-mule
encoding saves, that "showoff" file must be kept in emacs-mule. e.g.
the section "Difference among chinese characters in GB, JIS, KSC,
BIG5" would be impossible with Unicode, GB, JIS, KSC or BIG5. Only
emacs-mule have enough coding space to accomodate all characters from
these encodings and yet not unify them to make them look
non-identical.
--
Lee Sau Dan 李守敦(Big5) ~{@nJX6X~}(HZ)
E-mail: danlee@informatik.uni-freiburg.de
Home page: http://www.informatik.uni-freiburg.de/~danlee
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
[not found] ` <mailman.5976.1052661651.21513.help-gnu-emacs@gnu.org>
2003-05-12 19:29 ` Jason Rumney
2003-05-12 19:58 ` Kai Großjohann
@ 2003-05-13 7:40 ` Lee Sau Dan
2003-05-13 9:57 ` acmuller
2003-05-13 10:02 ` Robin Hu
2 siblings, 2 replies; 41+ messages in thread
From: Lee Sau Dan @ 2003-05-13 7:40 UTC (permalink / raw)
>>>>> "Charles" == Charles Muller <acmuller@gol.com> writes:
Charles> I never requested the removal of the HELLO file. All I
Charles> said was that as long as it was maintained in utf-7,
Are you sure it's utf-7? Then how come it can distinguish characters
in BIG5 and equivalent characters in JIS?
Charles> it was not especially useful as test file for people who
Charles> are trying to get their CJK working right.
It seems to me that when you talk about "CJK", you're actually
refering to "utf-8".
Many people using the CJK parts of Emacs only work with the national
encodings (Big5, GB, JIS, KSC, etc.) and in those cases, they Emacs
works excellently.
--
Lee Sau Dan 李守敦(Big5) ~{@nJX6X~}(HZ)
E-mail: danlee@informatik.uni-freiburg.de
Home page: http://www.informatik.uni-freiburg.de/~danlee
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
[not found] ` <mailman.5936.1052589798.21513.help-gnu-emacs@gnu.org>
@ 2003-05-13 7:40 ` Lee Sau Dan
2003-05-14 3:15 ` Eli Zaretskii
[not found] ` <mailman.6156.1052882447.21513.help-gnu-emacs@gnu.org>
0 siblings, 2 replies; 41+ messages in thread
From: Lee Sau Dan @ 2003-05-13 7:40 UTC (permalink / raw)
>>>>> "Eli" == Eli Zaretskii <eliz@elta.co.il> writes:
Eli> Until Emacs supports the full range of Unicode characters,
Eli> the encoding used now to save etc/HELLO is about _the_only_
Eli> one that can do the job. Let me remind you that in the
Eli> released versions of Emacs, only a subset of the BMP is
Eli> supported.
I don't think so. Unicode will never be able to handle the
"Difference among chinese characters in GB, JIS, KSC, BIG5:" section
in the etc/HELLO file. Unicode simply unifies the "equivalent"
characters that show up differently in that section of the HELLO file
into single code points.
--
Lee Sau Dan 李守敦(Big5) ~{@nJX6X~}(HZ)
E-mail: danlee@informatik.uni-freiburg.de
Home page: http://www.informatik.uni-freiburg.de/~danlee
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
2003-05-13 7:40 ` Lee Sau Dan
@ 2003-05-13 9:57 ` acmuller
2003-05-13 10:02 ` Robin Hu
1 sibling, 0 replies; 41+ messages in thread
From: acmuller @ 2003-05-13 9:57 UTC (permalink / raw)
On 5/13/2003, Lee Sau Dan wrote:
>Are you sure it's utf-7? Then how come it can distinguish characters
>in BIG5 and equivalent characters in JIS?
As was corrected in an earlier message, it is iso-2022, not utf-7.
> Charles> it was not especially useful as test file for people who
> Charles> are trying to get their CJK working right.
>
>It seems to me that when you talk about "CJK", you're actually
>refering to "utf-8".
No, I am not. But my discussion from the outset has been centered on utf-8
related problems. As you point out (and as I noted earlier in this thread)
most people are able to get CJK working with localized DCBS encodings
without too much trouble.
I, and the people that I am collaborating with in XML-based data projects,
are all at the bleeding edge using utf-8, and thus I have had to deal with
this problem extensively in trying to get everyone's systems set up
right.
Chuck
Charles Muller <acmuller@gol.com>
Toyo Gakuen University
Digital Dictionary of Buddhism: www.acmuller.net
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
2003-05-13 7:40 ` Lee Sau Dan
2003-05-13 9:57 ` acmuller
@ 2003-05-13 10:02 ` Robin Hu
2003-05-15 8:07 ` Lee Sau Dan
1 sibling, 1 reply; 41+ messages in thread
From: Robin Hu @ 2003-05-13 10:02 UTC (permalink / raw)
>>>>> "Lee" == Lee Sau Dan <danlee@informatik.uni-freiburg.de> writes:
>>>>> "Charles" == Charles Muller <acmuller@gol.com> writes:
Lee> Many people using the CJK parts of Emacs only work with the
Lee> national encodings (Big5, GB, JIS, KSC, etc.) and in those
Lee> cases, they Emacs works excellently.
I think you are over-simpilify this problem. ;-( Most CJK characters
are not encoded in either Big5 or GB or JIS or KSC, that's why the
GB coding standard change from gb2312 to gbk then to gb18030. AFAIK,
most chinese characters also cannot be coded within mule, and exists
unicode support does not solve this problem.
Of course, emacs is enough for most people in most time, but I am
really hesitated to tell my friend once and once again "Sorry, but
your name (羽中) is not supported by my emacs."
--
The goal of science is to build better mousetraps. The goal of nature
is to build better mice.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
2003-05-13 7:40 ` Lee Sau Dan
@ 2003-05-13 10:11 ` acmuller
2003-05-13 10:54 ` Charles Muller
[not found] ` <mailman.6097.1052826249.21513.help-gnu-emacs@gnu.org>
2 siblings, 0 replies; 41+ messages in thread
From: acmuller @ 2003-05-13 10:11 UTC (permalink / raw)
On 5/13/2003, "Lee Sau Dan" <danlee@informatik.uni-freiburg.de> wrote:
>I think you're being religious when you worship utf-8.
Who said anything about worship? Why the sarcasm? I am an XML developer.
UTF-8 is the standard encoding for XML documents.
See http://www.w3.org/TR/REC-xml
Chuck
Charles Muller <acmuller@gol.com>
Toyo Gakuen University
Digital Dictionary of Buddhism: www.acmuller.net
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
2003-05-13 7:40 ` Lee Sau Dan
2003-05-13 10:11 ` acmuller
@ 2003-05-13 10:54 ` Charles Muller
[not found] ` <mailman.6097.1052826249.21513.help-gnu-emacs@gnu.org>
2 siblings, 0 replies; 41+ messages in thread
From: Charles Muller @ 2003-05-13 10:54 UTC (permalink / raw)
Lee Sau Dan wrote:
> And for XML in Chinese, utf-8 wastes lots of space. To be practical,
> we often use big5 for XML files with Chinese.
That's fine, if all you are doing is Chinese. The documents in my project
include terms from over 15 languages, including Tibetan, Nepalese, Sanskrit,
Pali, and several European languages. Unicode has codepoints for these
characters, while Big5 (and other Chinese codesets) do not.
Chuck
---------------------------
Charles Muller <acmuller@gol.com>
Faculty of Humanities, Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary [http://www.acmuller.net]
H-Buddhism List Editor [http://www2.h-net.msu.edu/~buddhism/]
Mobile Phone: 090-9310-1787
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
2003-05-13 3:36 ` Charles Muller
@ 2003-05-14 3:14 ` Eli Zaretskii
0 siblings, 0 replies; 41+ messages in thread
From: Eli Zaretskii @ 2003-05-14 3:14 UTC (permalink / raw)
> Date: Tue, 13 May 2003 12:36:28 +0900 (JST)
> Newsgroups: gnu.emacs.help
> From: Charles Muller <acmuller@gol.com>
>
> I never said it was useless in general, and have never
> suggested that the HELLO file should be relegated to oblivion.
Perhaps someone else in this thread did, then.
> Since the HELLO file is used for internal testing by Emacs coders it almost
> always works correctly in any recent Emacs "out of the box."
The ability to display HELLO depends on the local configuration
(fonts), so it is not guaranteed to work on every platform.
> Since the people who usually make the suggestion to test via the HELLO are
> those who do not regularly use CJK, it seems that they are not aware of this
> discrepancy, and I wanted to point this out.
Actually, HELLO was created by people who use CJK every day.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
2003-05-13 7:40 ` Lee Sau Dan
@ 2003-05-14 3:15 ` Eli Zaretskii
[not found] ` <mailman.6156.1052882447.21513.help-gnu-emacs@gnu.org>
1 sibling, 0 replies; 41+ messages in thread
From: Eli Zaretskii @ 2003-05-14 3:15 UTC (permalink / raw)
> From: Lee Sau Dan <danlee@informatik.uni-freiburg.de>
> Newsgroups: gnu.emacs.help
> Date: 13 May 2003 09:40:16 +0200
>
> >>>>> "Eli" == Eli Zaretskii <eliz@elta.co.il> writes:
>
> Eli> Until Emacs supports the full range of Unicode characters,
> Eli> the encoding used now to save etc/HELLO is about _the_only_
> Eli> one that can do the job. Let me remind you that in the
> Eli> released versions of Emacs, only a subset of the BMP is
> Eli> supported.
>
> I don't think so. Unicode will never be able to handle the
> "Difference among chinese characters in GB, JIS, KSC, BIG5:" section
> in the etc/HELLO file.
Unicode doesn't, but the Unicode Emacs will. Trust me ;-)
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
[not found] ` <mailman.6084.1052797097.21513.help-gnu-emacs@gnu.org>
2003-05-13 7:05 ` Kai Großjohann
@ 2003-05-14 6:14 ` Lee Sau Dan
2003-05-14 16:27 ` Kai Großjohann
1 sibling, 1 reply; 41+ messages in thread
From: Lee Sau Dan @ 2003-05-14 6:14 UTC (permalink / raw)
>>>>> "Charles" == Charles Muller <acmuller@gol.com> writes:
Charles> One more time:
Charles> Since the HELLO file is used for internal testing by
Charles> Emacs coders it almost always works correctly in any
Charles> recent Emacs "out of the box."
No. If you have problems with the font installation (esp. when none
of your font servers offer the relevant fonts or your sys. admin.
simply don't care about your non-English needs), HELLO won't display
the glyphs. It only display boxes there.
Charles> The common misunderstanding occurs when people who are
Charles> trying to get CJK working in utf-8 write to this, or
Charles> another list for help, and list members, in the spirit of
Charles> trying to be helpful, suggest that all is fine if the
Charles> HELLO file displays right.
For utf-8 testing, I'd refer someone to the test files in the MuleUCS
package.
Charles> Since the people who usually make the suggestion to test
Charles> via the HELLO are those who do not regularly use CJK, it
Charles> seems that they are not aware of this discrepancy, and I
Charles> wanted to point this out.
No. Those people often use CJK regularly. They just don't use utf-8.
Like me (using Big5), they use a national encoding (e.g. GB2312, JIS,
KSC).
Charles> It seems strange to see people react so emotionally to
Charles> the exposure of this simple point. No one is asking that
Charles> the hallowed HELLO file be sent to oblivion--although a
Charles> reincarnation as utf-8 would certainly not hurt! :-)
That WILL certainly HURT. Look carefully at the section "Difference
among chinese characters in GB, JIS, KSC, BIG5:" in HELLO. The same
thing cannot be reproduced in vanilla utf-8, because Unicode unifies
the various characters in these encoding into one single code point.
(Most efforts in the earlier versions of Unicode were devoted to
_unifying_ characters from different languages, employing different
national encodings. The result is that you can no longer tell where a
unified character is from Korean, Japanese and Chinese, who write them
in slightly different ways.)
If you want to test UTF-8 (Why not UTF-16? People who really use
computers for Far East languages (CJK) would have to waste 50% disk
space if they use UTF-8 to store their text files. UTF-16 is more
space efficient.), do suggest including a UTF-8 test file. (Add a
line in HELLO to instruct anyone how to open the UTF-8 test file,
favourably with hot-key bindings.) And why stop there? Also have
UTF-16 and UTF-7 test files. UTF-8 is simply NOT the magic panacea.
It sucks when you have a file full of Chinese characters, for
instance. The 3-byte per Chinese character "feature" of UTF-8 sucks.
HELLO should remain a test file for the internal encoding "emacs-mule"
and for displaying the true multilingual capabilities of Emacs. It
has also been serving well to test font installation. It should never
be recoded in utf-8, IMO. If all you care about is UTF-8, have
another test file. Assuming that all CJK users should use UTF-8 is
like assuming that everyone should fall faith to Vatican.
--
Lee Sau Dan 李守敦(Big5) ~{@nJX6X~}(HZ)
E-mail: danlee@informatik.uni-freiburg.de
Home page: http://www.informatik.uni-freiburg.de/~danlee
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
2003-05-14 6:14 ` Lee Sau Dan
@ 2003-05-14 16:27 ` Kai Großjohann
2003-05-14 21:07 ` Jason Rumney
0 siblings, 1 reply; 41+ messages in thread
From: Kai Großjohann @ 2003-05-14 16:27 UTC (permalink / raw)
Lee Sau Dan <danlee@informatik.uni-freiburg.de> writes:
> If you want to test UTF-8 (Why not UTF-16? People who really use
> computers for Far East languages (CJK) would have to waste 50% disk
> space if they use UTF-8 to store their text files. UTF-16 is more
> space efficient.), do suggest including a UTF-8 test file. (Add a
> line in HELLO to instruct anyone how to open the UTF-8 test file,
> favourably with hot-key bindings.) And why stop there? Also have
> UTF-16 and UTF-7 test files. UTF-8 is simply NOT the magic panacea.
> It sucks when you have a file full of Chinese characters, for
> instance. The 3-byte per Chinese character "feature" of UTF-8 sucks.
Why not include UTF-8 characters in the HELLO file? I gather that
iso-2022 is general enough to also allow using UTF-8 as one of the
encodings it supports.
But I'm not an expert.
--
This line is not blank.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
2003-05-14 16:27 ` Kai Großjohann
@ 2003-05-14 21:07 ` Jason Rumney
0 siblings, 0 replies; 41+ messages in thread
From: Jason Rumney @ 2003-05-14 21:07 UTC (permalink / raw)
kai.grossjohann@gmx.net (Kai Großjohann) writes:
> Why not include UTF-8 characters in the HELLO file? I gather that
> iso-2022 is general enough to also allow using UTF-8 as one of the
> encodings it supports.
compound-text-with-extensions does, but not pure iso-2022 AFAIK.
Anyway, HELLO is in emacs-mule encoding, which is a 16-bit encoding
based on iso-2022, so only supports a certain fixed number of
character sets (most iso-2022 based encodings only support 2 or 3
character sets). CVS Emacs does contain some unicode text in HELLO,
but only a subset of Unicode is supported without conversion (hence
the existence of `utf-translate-cjk-mode' which causes the large
translation tables to be loaded).
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
2003-05-13 10:02 ` Robin Hu
@ 2003-05-15 8:07 ` Lee Sau Dan
0 siblings, 0 replies; 41+ messages in thread
From: Lee Sau Dan @ 2003-05-15 8:07 UTC (permalink / raw)
>>>>> "Robin" == Robin Hu <huxw@knight.6test.edu.cn> writes:
>>>>> "Lee" == Lee Sau Dan <danlee@informatik.uni-freiburg.de> writes:
>>>>> "Charles" == Charles Muller <acmuller@gol.com> writes:
Lee> Many people using the CJK parts of Emacs only work with the
Lee> national encodings (Big5, GB, JIS, KSC, etc.) and in those
Lee> cases, they Emacs works excellently.
Robin> I think you are over-simpilify this problem. ;-( Most
Robin> CJK characters are not encoded in either Big5 or GB or JIS
Robin> or KSC, that's why the GB coding standard change from
Robin> gb2312 to gbk then to gb18030.
It depends on what you mean by "most". Yes, if you include those 10s
of thousnds of *rare* characters, then even Unicode can fall short.
Most Chinese text, for instance, uses around 5000 distinct characters
only, of which around 1000 accounts for more than 90% of the
characters in a text. Big5 is very sufficient for normal use. If
not, the Chinese people won't have thrown it away (e.g. in favour of
Unicode). Similarly, Japanese texts employ around 3000 distinct
characters, and there is a government standard list of characters to
use. Characters outside that list should be theoretically avoided.
The characters in JIS are based on this set, AFAIK.
Robin> AFAIK, most chinese characters also cannot be coded within
Robin> mule, and exists unicode support does not solve this
Robin> problem.
As long as 99.99% of the characters that I need for Chinese text files
can be encoded in Big5 and emacs-mule, what's the problem?
Robin> Of course, emacs is enough for most people in most
Robin> time, but I am really hesitated to tell my friend once and
Robin> once again "Sorry, but your name (羽中) is not supported by
Robin> my emacs."
No, that's not my name. I think Gnus sets the charset of my postings
to big5.
And which Emacs is your emacs? Emacs since version 20 has been
displaying Chinese (I can't speak for Japanese and Korean) very
satisfactorily. And I find it, together with Gnus, to be the most
practical tool on Linux to read/write Chinese files/news/mails.
--
Lee Sau Dan 李守敦(Big5) ~{@nJX6X~}(HZ)
E-mail: danlee@informatik.uni-freiburg.de
Home page: http://www.informatik.uni-freiburg.de/~danlee
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
[not found] ` <mailman.6097.1052826249.21513.help-gnu-emacs@gnu.org>
@ 2003-05-15 8:07 ` Lee Sau Dan
0 siblings, 0 replies; 41+ messages in thread
From: Lee Sau Dan @ 2003-05-15 8:07 UTC (permalink / raw)
>>>>> "Charles" == Charles Muller <acmuller@gol.com> writes:
Charles> Lee Sau Dan wrote:
>> And for XML in Chinese, utf-8 wastes lots of space. To be
>> practical, we often use big5 for XML files with Chinese.
Charles> That's fine, if all you are doing is Chinese. The
Charles> documents in my project include terms from over 15
Charles> languages, including Tibetan, Nepalese, Sanskrit, Pali,
Charles> and several European languages. Unicode has codepoints
Charles> for these characters, while Big5 (and other Chinese
Charles> codesets) do not.
The emacs-mule encoding also has code points for these characters.
Moreover, it can distinguish big5 characters from JIS characters (see
the "Difference among chinese characters in GB, JIS, KSC, BIG5:"
section in the HELLO file.), while Unicode (and hence utf-7, utf-8,
utf-16, ucs2, ucs4) cannot.
--
Lee Sau Dan 李守敦(Big5) ~{@nJX6X~}(HZ)
E-mail: danlee@informatik.uni-freiburg.de
Home page: http://www.informatik.uni-freiburg.de/~danlee
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
[not found] ` <mailman.6156.1052882447.21513.help-gnu-emacs@gnu.org>
@ 2003-05-15 8:07 ` Lee Sau Dan
2003-05-16 11:36 ` Eli Zaretskii
0 siblings, 1 reply; 41+ messages in thread
From: Lee Sau Dan @ 2003-05-15 8:07 UTC (permalink / raw)
>>>>> "Eli" == Eli Zaretskii <eliz@elta.co.il> writes:
Eli> Until Emacs supports the full range of Unicode characters,
Eli> the encoding used now to save etc/HELLO is about _the_only_
Eli> one that can do the job. Let me remind you that in the
Eli> released versions of Emacs, only a subset of the BMP is
Eli> supported.
>> I don't think so. Unicode will never be able to handle the
>> "Difference among chinese characters in GB, JIS, KSC, BIG5:"
>> section in the etc/HELLO file.
Eli> Unicode doesn't, but the Unicode Emacs will. Trust me ;-)
So, you're agreeing that converting HELLO to utf-8 (which only
represents Unicode) is not a good idea? Or are you resorting to dirty
tricks using the Private Use Area?
--
Lee Sau Dan 李守敦(Big5) ~{@nJX6X~}(HZ)
E-mail: danlee@informatik.uni-freiburg.de
Home page: http://www.informatik.uni-freiburg.de/~danlee
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: Chinese characters support
2003-05-15 8:07 ` Lee Sau Dan
@ 2003-05-16 11:36 ` Eli Zaretskii
0 siblings, 0 replies; 41+ messages in thread
From: Eli Zaretskii @ 2003-05-16 11:36 UTC (permalink / raw)
> From: Lee Sau Dan <danlee@informatik.uni-freiburg.de>
> Newsgroups: gnu.emacs.help
> Date: 15 May 2003 10:07:01 +0200
>
> >> I don't think so. Unicode will never be able to handle the
> >> "Difference among chinese characters in GB, JIS, KSC, BIG5:"
> >> section in the etc/HELLO file.
>
> Eli> Unicode doesn't, but the Unicode Emacs will. Trust me ;-)
>
> So, you're agreeing that converting HELLO to utf-8 (which only
> represents Unicode) is not a good idea?
I don't know yet. AFAIK, the issue of encoding etc/HELLO in the
Unicode Emacs was not discussed yet, but I expect it to be encoded in
the internal Emacs representation of characters, because that by
definition will support all the characters suppored by Emacs, and do
that unambiguously.
^ permalink raw reply [flat|nested] 41+ messages in thread
end of thread, other threads:[~2003-05-16 11:36 UTC | newest]
Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <mailman.5730.1052348993.21513.help-gnu-emacs@gnu.org>
2003-05-10 14:26 ` Chinese characters support Kai Großjohann
2003-05-10 16:17 ` Charles Muller
2003-05-10 16:45 ` Kai Großjohann
2003-05-10 17:31 ` Charles Muller
2003-05-10 18:43 ` Eli Zaretskii
2003-05-11 2:11 ` Charles Muller
2003-05-11 3:32 ` Eli Zaretskii
2003-05-11 13:59 ` Charles Muller
[not found] ` <mailman.5976.1052661651.21513.help-gnu-emacs@gnu.org>
2003-05-12 19:29 ` Jason Rumney
2003-05-12 19:58 ` Kai Großjohann
2003-05-13 7:40 ` Lee Sau Dan
2003-05-13 9:57 ` acmuller
2003-05-13 10:02 ` Robin Hu
2003-05-15 8:07 ` Lee Sau Dan
2003-05-10 19:24 ` Kai Großjohann
2003-05-11 2:15 ` Charles Muller
2003-05-11 3:34 ` Eli Zaretskii
[not found] ` <mailman.5956.1052619415.21513.help-gnu-emacs@gnu.org>
2003-05-12 19:56 ` Kai Großjohann
2003-05-13 3:36 ` Charles Muller
2003-05-14 3:14 ` Eli Zaretskii
[not found] ` <mailman.6084.1052797097.21513.help-gnu-emacs@gnu.org>
2003-05-13 7:05 ` Kai Großjohann
2003-05-14 6:14 ` Lee Sau Dan
2003-05-14 16:27 ` Kai Großjohann
2003-05-14 21:07 ` Jason Rumney
[not found] ` <mailman.5927.1052587973.21513.help-gnu-emacs@gnu.org>
2003-05-12 19:27 ` Jason Rumney
2003-05-13 7:40 ` Lee Sau Dan
2003-05-13 10:11 ` acmuller
2003-05-13 10:54 ` Charles Muller
[not found] ` <mailman.6097.1052826249.21513.help-gnu-emacs@gnu.org>
2003-05-15 8:07 ` Lee Sau Dan
2003-05-10 17:58 ` Eli Zaretskii
[not found] ` <mailman.5936.1052589798.21513.help-gnu-emacs@gnu.org>
2003-05-13 7:40 ` Lee Sau Dan
2003-05-14 3:15 ` Eli Zaretskii
[not found] ` <mailman.6156.1052882447.21513.help-gnu-emacs@gnu.org>
2003-05-15 8:07 ` Lee Sau Dan
2003-05-16 11:36 ` Eli Zaretskii
2003-05-12 23:05 ` Michael Na Li
2003-05-13 7:02 ` Kai Großjohann
[not found] ` <mailman.5922.1052583563.21513.help-gnu-emacs@gnu.org>
2003-05-13 7:40 ` Lee Sau Dan
2003-05-07 23:08 Gaoyan Xie
2003-05-08 6:27 ` Charles Muller
[not found] ` <mailman.5739.1052375326.21513.help-gnu-emacs@gnu.org>
2003-05-08 7:33 ` Robin Hu
2003-05-10 14:28 ` Kai Großjohann
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).