* How to make emacs auto-recognize utf-8 encoded files upon visiting
@ 2002-09-23 16:39 Gerald Wildgruber
2002-09-23 23:35 ` Jesper Harder
` (2 more replies)
0 siblings, 3 replies; 32+ messages in thread
From: Gerald Wildgruber @ 2002-09-23 16:39 UTC (permalink / raw)
Hello,
I'm trying to make my emacs (GNU Emacs 21.3.50.1 on linux) auto-recognize
the right encoding when visiting files with utf-8 encoding. The emacs info
help entry says on the topic:
"Some coding systems can be recognized or distinguished by which byte
sequences appear in the data. However, there are coding systems that cannot
be distinguished, not even potentially."
Does this also apply to utf-8 encoded files? Is it impossible for emacs to
auto-recognize them (as for example the `file' command on the shell does)?
I'm aware of how to do this with File Variables (either by using the
`-*-...-*-' construct or a local variables list at the end of the file).
Both of them work well. Setting `(prefer-coding-system 'utf-8)' in `.emacs'
also works, but is kind of intrusive as all new files are then using this
encoding by default.
Even without file variables, Emacs does correctly recognize the encoding
when visiting latin-1 or latin-9 encoded files. Yet it fails when visiting
utf-8 encoded files. I get the `-:--' abbrev on the encoding part of the
modeline, and letters beyond ascii are messed up.
Can anyone give me a hint on how to make emacs find the correct coding
system (without setting it explicitly through file variables)?
Thanks,
Gerald.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting
2002-09-23 16:39 How to make emacs auto-recognize utf-8 encoded files upon visiting Gerald Wildgruber
@ 2002-09-23 23:35 ` Jesper Harder
2002-09-24 3:29 ` Charles Muller
[not found] ` <mailman.1032838300.26368.help-gnu-emacs@gnu.org>
2002-09-24 11:45 ` auto-recognize utf-8 encoded files upon visiting: solved (sort of...) Gerald Wildgruber
2002-09-24 18:57 ` How to make emacs auto-recognize utf-8 encoded files upon visiting Dominic Cronin
2 siblings, 2 replies; 32+ messages in thread
From: Jesper Harder @ 2002-09-23 23:35 UTC (permalink / raw)
Gerald Wildgruber <gwil.remove.this.phrase@lrz.uni-muenchen.de> writes:
> I'm trying to make my emacs (GNU Emacs 21.3.50.1 on linux) auto-recognize
> the right encoding when visiting files with utf-8 encoding.
>
> Can anyone give me a hint on how to make emacs find the correct coding
> system (without setting it explicitly through file variables)?
Does it work if you say `M-x prefer-coding-system utf-8' ?
If it doesn't, you can open a file using a specified coding system with:
`C-x RET c utf-8 RET C-x C-f'
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting
2002-09-23 23:35 ` Jesper Harder
@ 2002-09-24 3:29 ` Charles Muller
[not found] ` <mailman.1032838300.26368.help-gnu-emacs@gnu.org>
1 sibling, 0 replies; 32+ messages in thread
From: Charles Muller @ 2002-09-24 3:29 UTC (permalink / raw)
Cc: help-gnu-emacs
Jesper Harder wrote:
> > I'm trying to make my emacs (GNU Emacs 21.3.50.1 on linux) auto-recognize
> > the right encoding when visiting files with utf-8 encoding.
Sorry to mention TEI-Emacs again, but its superior ability to identify
document encodings (especially Unicode) is one of things that I
like best about it.
http://www.tei-c.org/Software/tei-emacs.tar.gz
>Does it work if you say `M-x prefer-coding-system utf-8' ?
> If it doesn't, you can open a file using a specified coding system with:
> `C-x RET c utf-8 RET C-x C-f'
I had also tried these strategies, and they will allow you to edit a
document as UTF-8, but AFAICT they don't solve the problem of having Emacs
recognize these encodings automatically.
Chuck
---------------------------
Charles Muller <acmuller@gol.com>
Faculty of Humanities, Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary
[http://www.acmuller.net]
Mobile Phone: 090-9310-1787
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting
[not found] ` <mailman.1032838300.26368.help-gnu-emacs@gnu.org>
@ 2002-09-24 6:27 ` Miles Bader
2002-09-24 8:59 ` Charles Muller
[not found] ` <mailman.1032848900.31556.help-gnu-emacs@gnu.org>
1 sibling, 1 reply; 32+ messages in thread
From: Miles Bader @ 2002-09-24 6:27 UTC (permalink / raw)
Charles Muller <acmuller@gol.com> writes:
> Sorry to mention TEI-Emacs again, but its superior ability to identify
> document encodings (especially Unicode) is one of things that I
> like best about it.
I'm sure, but saying `download an entirely new version of emacs' is a
rather heavyweight solution.
If they've hacked emacs to do a better job of this, how did they do it?
Did they ever try to submit their changes back to emacs?
-Miles
--
Would you like fries with that?
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting
[not found] ` <mailman.1032848900.31556.help-gnu-emacs@gnu.org>
@ 2002-09-24 8:26 ` A. Lucien Meyers
0 siblings, 0 replies; 32+ messages in thread
From: A. Lucien Meyers @ 2002-09-24 8:26 UTC (permalink / raw)
* Miles Bader <miles@lsi.nec.co.jp>:
> Charles Muller <acmuller@gol.com> writes:
> > Sorry to mention TEI-Emacs again, but its superior ability to
> > identify document encodings (especially Unicode) is one of things
> > that I like best about it.
>
> I'm sure, but saying `download an entirely new version of emacs' is
> a rather heavyweight solution.
>
> If they've hacked emacs to do a better job of this, how did they do
> it? Did they ever try to submit their changes back to emacs?
TEI-Emacs? Huh? What, where, why? Are TEI developers observing GPL?
Lucien
--
If you receive this by error, please delete it and inform the sender.
http://www.consult-meyers.com recommends email encryption using GnuPG.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting
2002-09-24 6:27 ` Miles Bader
@ 2002-09-24 8:59 ` Charles Muller
2002-09-24 15:12 ` Eli Zaretskii
2002-09-24 19:05 ` tramp Roger Mason
0 siblings, 2 replies; 32+ messages in thread
From: Charles Muller @ 2002-09-24 8:59 UTC (permalink / raw)
Miles Bader asked:
>I'm sure, but saying `download an entirely new version of emacs' is a
>rather heavyweight solution.
As I understand it, it is not an entirely new version, in the sense that it
does not replace the basic program files for Emacs itself. But it supplies
its own version of Mule-UCS, and everything concerned with doing
XML/HTML/XSLT in international environments.
> If they've hacked emacs to do a better job of this, how did they do it?
> Did they ever try to submit their changes back to emacs?
I am not a member of the TEI Consortium, so I can't answer this. Certainly,
all the source code for what they've done is readily and openly available
for anyone interested.
All I know is that before I began to use this package, when doing mail in
Emacs (21.2), there was never any automatic recognition of any kinds of
non-Western encoding. Now it recognizes Japanese, Korean, Chinese, and
whatever, whether these be in the local encoding (JIS, Big5, etc.) or
Unicode, without me having to do "prefer-coding-system" or anything like
that. As a person working in several languages, Emacs would be of very
little use to me without this.
Chuck
---------------------------
Charles Muller <acmuller@gol.com>
Faculty of Humanities, Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary
[http://www.acmuller.net]
Mobile Phone: 090-9310-1787
^ permalink raw reply [flat|nested] 32+ messages in thread
* auto-recognize utf-8 encoded files upon visiting: solved (sort of...)
2002-09-23 16:39 How to make emacs auto-recognize utf-8 encoded files upon visiting Gerald Wildgruber
2002-09-23 23:35 ` Jesper Harder
@ 2002-09-24 11:45 ` Gerald Wildgruber
2002-09-24 12:39 ` Charles Muller
[not found] ` <mailman.1032871109.14505.help-gnu-emacs@gnu.org>
2002-09-24 18:57 ` How to make emacs auto-recognize utf-8 encoded files upon visiting Dominic Cronin
2 siblings, 2 replies; 32+ messages in thread
From: Gerald Wildgruber @ 2002-09-24 11:45 UTC (permalink / raw)
Thanks to everybody who helped answering my question!
What I was trying to do was to make emacs auto-recognize utf-8 encoded
files upon visiting. In the beginning, it didn't.
The problem seems to have been that -- latin-9 being my primary language
environment -- utf-8 only appeared at the end of my priority list for
encodings. Emacs then seems to take my utf-8 file as beeing encoded in one
of the prior coding entries of the priority list (probably as one of the
iso-2022 family). Seems to be an erroneous recognition. Letters beyond
ascii are messed up then.
I didn't want utf-8 to be on the first place of the priority list (because
then all newly cereated files then have it as their default encoding), but
neither on the last one.
If you do a double M-x prefer-coding-system, the first time with utf-8,
the second time with latin-9 as the value, utf is promoted to the second
place of the priority list, latin-9 remains on the first place.
Now without any explicit indication of the encoding (e.g. via file
variables) emacs correctly recognizes the encoding, when I'm visiting
utf-8 files.
To achieve this entry order in the priority list PERMANENTLY I simply put
the following two lines, in this order, into my init file:
(prefer-coding-system 'utf-8)
(prefer-coding-system 'latin-9)
I'm sure there is a cleaner and more elegant solution, but it kind of works.
Last remark:
Charles, thanks for your hint on TEI; I gave TEI a long try many years ago,
when SGML came up, they provided very good introductory material to the
whole issue. But I didn't know of their work on emacs. You say that the
unicode stuff didn't work right on emacs 21.2. I compiled an emacs version
from the CVS sources (http://savannah.gnu.org/projects/emacs/) and there
unicode integration seems to be already more evolved than in the official
distribution. Almost everything works very well. Perhaps you should give it
a try. I'm also working with different languages (cl.greek) and I am very
happy with the evolving unicode capabilities of emacs. I think unicode
integration is THE way by which emacs stops being merely a tool for
programmers and addresses a wider audience also in the humanities.
Gerald.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: auto-recognize utf-8 encoded files upon visiting: solved (sort of...)
2002-09-24 11:45 ` auto-recognize utf-8 encoded files upon visiting: solved (sort of...) Gerald Wildgruber
@ 2002-09-24 12:39 ` Charles Muller
[not found] ` <mailman.1032871109.14505.help-gnu-emacs@gnu.org>
1 sibling, 0 replies; 32+ messages in thread
From: Charles Muller @ 2002-09-24 12:39 UTC (permalink / raw)
Gerald wrote:
> Charles, thanks for your hint on TEI; I gave TEI a long try many years ago,
> when SGML came up, they provided very good introductory material to the
> whole issue. But I didn't know of their work on emacs. You say that the
> unicode stuff didn't work right on emacs 21.2.
It worked OK for me for European languages, and I can do CJK input/output in
Chinese, Japanese, Korean, either in the national encodings or in utf-8. But
regular Emacs 21.2, although it seems to be able to recognize the encodings
of files, does not seem to be able to apply the proper fonts very
well--especially East Asian fonts, when they are mixed in together with Latin.
> I compiled an emacs version
> from the CVS sources (http://savannah.gnu.org/projects/emacs/) and there
> unicode integration seems to be already more evolved than in the official
> distribution. Almost everything works very well. Perhaps you should give it
> a try.
TEI-Emacs applies a "fontifying" process wherein the proper fonts are
applied to all of the different mixed-language codepoints that I am
using. I wonder if this recent package you mention can do that?
In any case, what you have told me here is certainly good news, and I'll
keep it in mind. As you have guessed, I do not use Emacs for programming,
but as an infinitely customizable text-editing environment to carry out
humanities research projects.
The main reason I use TEI-Emacs is not for the Unicode handling, but because I
am running a number of research projects that are structured by TEI-XML. I'm
using their DTD's, style sheets, and everything, so it's a complete package
for me, that takes care of almost everything.
Nonetheless, like you, I'm looking forward to the continued development of the
Emacsens toward Unicode. I know that Xemacs 21.5 will set UTF-8 as the
default encoding. I hope the developers at GNU Emacs have similar intentions
in mind.
Regards,
Chuck
---------------------------
Charles Muller <acmuller@gol.com>
Faculty of Humanities, Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary
[http://www.acmuller.net]
Mobile Phone: 090-9310-1787
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting
2002-09-24 8:59 ` Charles Muller
@ 2002-09-24 15:12 ` Eli Zaretskii
2002-09-25 6:45 ` Charles Muller
[not found] ` <mailman.1032936261.7964.help-gnu-emacs@gnu.org>
2002-09-24 19:05 ` tramp Roger Mason
1 sibling, 2 replies; 32+ messages in thread
From: Eli Zaretskii @ 2002-09-24 15:12 UTC (permalink / raw)
Cc: help-gnu-emacs
On Tue, 24 Sep 2002, Charles Muller wrote:
> All I know is that before I began to use this package, when doing mail in
> Emacs (21.2), there was never any automatic recognition of any kinds of
> non-Western encoding. Now it recognizes Japanese, Korean, Chinese, and
> whatever, whether these be in the local encoding (JIS, Big5, etc.) or
> Unicode, without me having to do "prefer-coding-system" or anything like
> that.
FWIW, my stock Emacs 21.2 recognizes non-Western encodings right out of
the box.
I'm not saying that you were dreaming, just that those problems are not
trivial to reproduce, so a full-blown bug report is in order.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting
2002-09-23 16:39 How to make emacs auto-recognize utf-8 encoded files upon visiting Gerald Wildgruber
2002-09-23 23:35 ` Jesper Harder
2002-09-24 11:45 ` auto-recognize utf-8 encoded files upon visiting: solved (sort of...) Gerald Wildgruber
@ 2002-09-24 18:57 ` Dominic Cronin
2 siblings, 0 replies; 32+ messages in thread
From: Dominic Cronin @ 2002-09-24 18:57 UTC (permalink / raw)
On 23 Sep 2002 18:39:19 +0200, Gerald Wildgruber
<gwil.remove.this.phrase@lrz.uni-muenchen.de> wrote:
>
>Hello,
>
>I'm trying to make my emacs (GNU Emacs 21.3.50.1 on linux) auto-recognize
>the right encoding when visiting files with utf-8 encoding. The emacs info
>help entry says on the topic:
>
>"Some coding systems can be recognized or distinguished by which byte
>sequences appear in the data. However, there are coding systems that cannot
>be distinguished, not even potentially."
>
>Does this also apply to utf-8 encoded files? Is it impossible for emacs to
>auto-recognize them (as for example the `file' command on the shell does)?
The RFC for UTF-8 (see http://www.ietf.org/rfc/rfc2279.txt) states:
UTF-8 strings can be fairly reliably recognized as such by a simple
algorithm, i.e. the probability that a string of characters in any
other encoding appears as valid UTF-8 is low, diminishing with
increasing string length.
BTW - the RFC is quite an interesting read: an elegant solution to a
problem.
--
Dominic Cronin
Amsterdam
^ permalink raw reply [flat|nested] 32+ messages in thread
* tramp
2002-09-24 8:59 ` Charles Muller
2002-09-24 15:12 ` Eli Zaretskii
@ 2002-09-24 19:05 ` Roger Mason
1 sibling, 0 replies; 32+ messages in thread
From: Roger Mason @ 2002-09-24 19:05 UTC (permalink / raw)
Hi,
I downloaded tramp-2.0.22 and installed it. I am trying to get it
configured. For now I have to use telnet as the default connection method,
so I executed
(setq tramp-default-method "tm") in the scratch buffer and it returned
"tm"
When I try to get a file from a my home dir on a remote machine:
/fred.here.there.everywhere:~/.emacs
I get
Method 'tm' didn't specify a connection function in the mini-buffer.
Obviously I missed something in the configuration. Can anyone help?
Thanks,
Roger Mason
emacs 21.1
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting
2002-09-24 15:12 ` Eli Zaretskii
@ 2002-09-25 6:45 ` Charles Muller
2002-09-25 6:55 ` Eli Zaretskii
[not found] ` <mailman.1032936261.7964.help-gnu-emacs@gnu.org>
1 sibling, 1 reply; 32+ messages in thread
From: Charles Muller @ 2002-09-25 6:45 UTC (permalink / raw)
Cc: help-gnu-emacs
Eli wrote:
> FWIW, my stock Emacs 21.2 recognizes non-Western encodings right out of
> the box.
Are you saying that if you open up a document in your build of 21.2 that
contains, for instance, Japanese, Chinese, Korean, and Latin characters, all
the fonts display correctly without needing any tweaking at all? If so, this is the first case I have heard
among many of my colleagues who are using Emacs.
> I'm not saying that you were dreaming, just that those problems are not
> trivial to reproduce, so a full-blown bug report is in order.
I don't see it as a bug. I see it more as a level of technical capability
that Emacs has not yet reached to, but no doubt will some day.
Chuck
---------------------------
Charles Muller <acmuller@gol.com>
Faculty of Humanities, Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary
[http://www.acmuller.net]
Mobile Phone: 090-9310-1787
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting
2002-09-25 6:45 ` Charles Muller
@ 2002-09-25 6:55 ` Eli Zaretskii
2002-09-25 8:07 ` Charles Muller
` (3 more replies)
0 siblings, 4 replies; 32+ messages in thread
From: Eli Zaretskii @ 2002-09-25 6:55 UTC (permalink / raw)
Cc: help-gnu-emacs
On Wed, 25 Sep 2002, Charles Muller wrote:
> Are you saying that if you open up a document in your build of 21.2 that
> contains, for instance, Japanese, Chinese, Korean, and Latin characters, all
> the fonts display correctly without needing any tweaking at all?
Yes. Don't you see that when you visit etc/HELLO?
> > I'm not saying that you were dreaming, just that those problems are not
> > trivial to reproduce, so a full-blown bug report is in order.
>
> I don't see it as a bug. I see it more as a level of technical capability
> that Emacs has not yet reached to
That's another definition of a bug ;-)
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting
2002-09-25 6:55 ` Eli Zaretskii
@ 2002-09-25 8:07 ` Charles Muller
2002-09-25 8:33 ` Charles Muller
` (2 subsequent siblings)
3 siblings, 0 replies; 32+ messages in thread
From: Charles Muller @ 2002-09-25 8:07 UTC (permalink / raw)
Cc: help-gnu-emacs
Eli wrote:
> Don't you see that when you visit etc/HELLO?
This is interesting, because the CJK fonts in the HELLO file *do* display
correctly. But when I open up any other utf-8 file on my system without
using TEI-Emacs, they don't display. Odd, isn't it?
Chuck
---------------------------
Charles Muller <acmuller@gol.com>
Faculty of Humanities, Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary
[http://www.acmuller.net]
Mobile Phone: 090-9310-1787
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting
[not found] ` <mailman.1032936261.7964.help-gnu-emacs@gnu.org>
@ 2002-09-25 8:23 ` Miles Bader
2002-09-25 14:55 ` Stefan Monnier <foo@acm.com>
1 sibling, 0 replies; 32+ messages in thread
From: Miles Bader @ 2002-09-25 8:23 UTC (permalink / raw)
Charles Muller <acmuller@gol.com> writes:
> Are you saying that if you open up a document in your build of 21.2
> that contains, for instance, Japanese, Chinese, Korean, and Latin
> characters, all the fonts display correctly without needing any
> tweaking at all?
That's such a vague question that it's hard to answer -- certainly I've
never had any particular problems with the above (though the only time
I've ever seen _all_ of them in a single file, is in the `HELLO' file
that comes with emacs).
But for my standard daily usage, everything seems to work fine, with no
font tweaking or whatever. Sometimes I get spam email that emacs fails
to decode correctly, though.
[I do (set-language-environment "Japanese"), which may affect this]
> > I'm not saying that you were dreaming, just that those problems are not
> > trivial to reproduce, so a full-blown bug report is in order.
>
> I don't see it as a bug. I see it more as a level of technical capability
> that Emacs has not yet reached to, but no doubt will some day.
Your original message alluded to problems recognizing certain
encodings; that should be easy to report as a bug -- send a document
which emacs fails to recognize, which it probably should.
-Miles
--
`...the Soviet Union was sliding in to an economic collapse so comprehensive
that in the end its factories produced not goods but bads: finished products
less valuable than the raw materials they were made from.' [The Economist]
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting
2002-09-25 6:55 ` Eli Zaretskii
2002-09-25 8:07 ` Charles Muller
@ 2002-09-25 8:33 ` Charles Muller
2002-09-26 4:42 ` Eli Zaretskii
2002-09-25 9:21 ` Charles Muller
2002-09-25 9:26 ` Charles Muller
3 siblings, 1 reply; 32+ messages in thread
From: Charles Muller @ 2002-09-25 8:33 UTC (permalink / raw)
Cc: help-gnu-emacs
[-- Attachment #1: Type: Text/Plain, Size: 534 bytes --]
Dear Eli,
I am curious as to why I am able to see fonts in the HELLO file, but not in
any of my other utf-8 encoded text files (unless I use TEI-Emacs). I want
to ask you if you are also able to see the Chinese in the attached text
file. Also, are you using Linux? If so, which distribution?
Thanks,
Chuck
---------------------------
Charles Muller <acmuller@gol.com>
Faculty of Humanities, Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary
[http://www.acmuller.net]
Mobile Phone: 090-9310-1787
[-- Attachment #2: Lesson11.xml --]
[-- Type: Application/Xml, Size: 1757 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting
2002-09-25 6:55 ` Eli Zaretskii
2002-09-25 8:07 ` Charles Muller
2002-09-25 8:33 ` Charles Muller
@ 2002-09-25 9:21 ` Charles Muller
2002-09-25 9:26 ` Charles Muller
3 siblings, 0 replies; 32+ messages in thread
From: Charles Muller @ 2002-09-25 9:21 UTC (permalink / raw)
Cc: help-gnu-emacs
After Eli wrote ...
> Yes. Don't you see that when you visit etc/HELLO?
I spend quite a bit of time trying to visit utf-8 files on my hard drives in
Emacs 21.2 without the TEI package loaded, and could not view Chinese fonts
in any of them, which is odd, in view of the fact that I can see them in the
HELLO file. Then I tried C-u C-x = on some of the characters in HELLO, and
got this interesting piece of information (for example):
file code: ESC 24 28 43 31 5B
(encoded by coding system iso-2022-7bit-unix)
Unless I am completely misunderstanding something (and I may well be, for I
am not a programmer) if this file is encoded as iso-2022-7, it seems that we should not be using
it as a test example of utf-8 functionality. Am I right? I have written up
a small test file in utf-8 that contains just one line each of Korean, Japanese,
and Chinese, in case any one is interested in trying it. It displays for me
fine with TEI installed, but as gibberish without it.
http://www.acmuller.net/test.txt
Chuck
---------------------------
Charles Muller <acmuller@gol.com>
Faculty of Humanities, Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary
[http://www.acmuller.net]
Mobile Phone: 090-9310-1787
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting
2002-09-25 6:55 ` Eli Zaretskii
` (2 preceding siblings ...)
2002-09-25 9:21 ` Charles Muller
@ 2002-09-25 9:26 ` Charles Muller
2002-09-25 9:41 ` Charles Muller
3 siblings, 1 reply; 32+ messages in thread
From: Charles Muller @ 2002-09-25 9:26 UTC (permalink / raw)
After Eli wrote ...
> Yes. Don't you see that when you visit etc/HELLO?
I spent quite a bit of time trying to visit utf-8 files on my hard drives in
Emacs 21.2 without the TEI package loaded, and could not view East Asian fonts
in any of them, which is odd, in view of the fact that I can see them in the
HELLO file. Then I tried C-u C-x = on some of the characters in HELLO, and
got this interesting piece of information:
(encoded by coding system iso-2022-7bit-unix)
Unless I am completely misunderstanding something (and I may well be, for I
am not a programmer) if this file is encoded as iso-2022-7bit, it seems that we should not be using
it as a test example of utf-8 functionality. Am I right? I have written up
a small test file in utf-8 that contains just one line each of Korean, Japanese,
and Chinese, in case any one is interested in trying it. It displays for me
fine with TEI-Emacs installed, but as gibberish without it (Emacs 21.2 on Red Hat
7.3). I have Mule installed in my regular Emacs package, as well as all the necessary fonts.
http://www.acmuller.net/test.txt
Chuck
---------------------------
Charles Muller <acmuller@gol.com>
Faculty of Humanities, Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary
[http://www.acmuller.net]
Mobile Phone: 090-9310-1787
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting
2002-09-25 9:26 ` Charles Muller
@ 2002-09-25 9:41 ` Charles Muller
0 siblings, 0 replies; 32+ messages in thread
From: Charles Muller @ 2002-09-25 9:41 UTC (permalink / raw)
I wrote:
> (encoded by coding system iso-2022-7bit-unix)
>
> Unless I am completely misunderstanding something (and I may well be, for I
> am not a programmer) if this file is encoded as iso-2022-7bit, it seems that we should not be using
> it as a test example of utf-8 functionality.
I am pretty sure now that I am right about this, because when I check the
codepoints in this file
> http://www.acmuller.net/test.txt
I get this kind of result:
category: h:Korean
buffer code: 0x93 0xBF 0xF8
file code: 0xEC 0x9B 0x90
(encoded by coding system utf-8-unix)
So I think we can forget about HELLO for the purposes of the present discussion.
Chuck
---------------------------
Charles Muller <acmuller@gol.com>
Faculty of Humanities, Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary
[http://www.acmuller.net]
Mobile Phone: 090-9310-1787
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: auto-recognize utf-8 encoded files upon visiting: solved (sort of...)
[not found] ` <mailman.1032871109.14505.help-gnu-emacs@gnu.org>
@ 2002-09-25 14:28 ` A. L. Meyers
0 siblings, 0 replies; 32+ messages in thread
From: A. L. Meyers @ 2002-09-25 14:28 UTC (permalink / raw)
Very nice to hear some knowledgable people reiterate the significance
and importance of unicode in the intellectual history of mankind. And
even nicer to see emacsen in its vanguard.
Lucien
--
If you receive this by error, please delete it and inform the sender.
PGP key fingerprint=F1C0 D9AE 1B18 1405 4DFA B4CC 6DC7 FF78 C76E FB15
To Big Brother Echelon from "spook":
SEAL Team 6 smuggle supercomputer Soviet opus dei Somalia Kabul spy Nazi
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting
[not found] ` <mailman.1032936261.7964.help-gnu-emacs@gnu.org>
2002-09-25 8:23 ` Miles Bader
@ 2002-09-25 14:55 ` Stefan Monnier <foo@acm.com>
1 sibling, 0 replies; 32+ messages in thread
From: Stefan Monnier <foo@acm.com> @ 2002-09-25 14:55 UTC (permalink / raw)
> Are you saying that if you open up a document in your build of 21.2 that
> contains, for instance, Japanese, Chinese, Korean, and Latin characters,
> all the fonts display correctly without needing any tweaking at all?
Please keep in mind that UTF-8 is not the only encoding in the world.
I had to read the rest of the thread to understand that you meant
"a UTF-8 file with the above chars".
Indeed the stock Emacs distribution doesn't yet handle UTF-8 encoded
asian chars quite right. IIUC the development code is getting closer,
but isn't quite there yet.
To get what you want you currently need to use Mule-UCS (which I gather
is one of the things that are included in TEI-Emacs).
Stefan
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting
2002-09-25 8:33 ` Charles Muller
@ 2002-09-26 4:42 ` Eli Zaretskii
2002-09-26 7:00 ` Charles Muller
0 siblings, 1 reply; 32+ messages in thread
From: Eli Zaretskii @ 2002-09-26 4:42 UTC (permalink / raw)
Cc: help-gnu-emacs
On Wed, 25 Sep 2002, Charles Muller wrote:
> I am curious as to why I am able to see fonts in the HELLO file, but not in
> any of my other utf-8 encoded text files (unless I use TEI-Emacs).
You need to install Unicode fonts as explained in the file INSTALL in the
Emacs distribution.
> I want
> to ask you if you are also able to see the Chinese in the attached text
> file.
Stock Emacs 21.2 doesn't support utf-8 encoded files if the characters
are outside the Latin-1 region, IIRC.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting
2002-09-26 4:42 ` Eli Zaretskii
@ 2002-09-26 7:00 ` Charles Muller
2002-09-26 16:05 ` Eli Zaretskii
0 siblings, 1 reply; 32+ messages in thread
From: Charles Muller @ 2002-09-26 7:00 UTC (permalink / raw)
Eli wrote,
> You need to install Unicode fonts as explained in the file INSTALL in the
> Emacs distribution.
Due to my concerns with CJK and Unicode, this was the first thing I did when
I set up my Emacs.
I should reiterate: Since I already have a solution in the form of
TEI-Emacs, it doesn't matter to me so much personally whether or not this issue is
solved here. But I would like to point out that although there has been much
generous advice offered on the matter, I haven't heard a subsequent response
from anyone who has been *successful* with this--and I mean in a real UTF-8
environment, and not the 7-bit HELLO file.
This is also a true among my colleagues in East Asian studies who are using
Emacs. Every one of them that I know who has handled the problem
successfully has done so through the TEI extension. So until we hear
otherwise from someone who has succeeded, and can tell us how, I would
suggest that it is simply not doable with the present build of Emacs, and
that we should just admit it.
Chuck
---------------------------
Charles Muller <acmuller@gol.com>
Faculty of Humanities, Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary
[http://www.acmuller.net]
Mobile Phone: 090-9310-1787
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting
2002-09-26 7:00 ` Charles Muller
@ 2002-09-26 16:05 ` Eli Zaretskii
2002-09-27 0:36 ` Charles Muller
[not found] ` <mailman.1033086929.4506.help-gnu-emacs@gnu.org>
0 siblings, 2 replies; 32+ messages in thread
From: Eli Zaretskii @ 2002-09-26 16:05 UTC (permalink / raw)
Cc: help-gnu-emacs
> From: Charles Muller <acmuller@gol.com>
> Date: Thu, 26 Sep 2002 16:00:18 +0900 (JST)
>
> This is also a true among my colleagues in East Asian studies who are using
> Emacs. Every one of them that I know who has handled the problem
> successfully has done so through the TEI extension. So until we hear
> otherwise from someone who has succeeded, and can tell us how, I would
> suggest that it is simply not doable with the present build of Emacs, and
> that we should just admit it.
Stefan already explained that UTF-8 support in Emacs 21.2 does not
include CJK characters. (IIRC, there's an entry in etc/PROBLEMS about
that.) However, your original complaint seemed to indicate that CJK
support doesn't work in general, not only in UTF-8 encoded files,
which is a much graver problem if it were true.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting
2002-09-26 16:05 ` Eli Zaretskii
@ 2002-09-27 0:36 ` Charles Muller
[not found] ` <mailman.1033086929.4506.help-gnu-emacs@gnu.org>
1 sibling, 0 replies; 32+ messages in thread
From: Charles Muller @ 2002-09-27 0:36 UTC (permalink / raw)
Cc: help-gnu-emacs
Eli wrote:
>However, your original complaint seemed to indicate that CJK
> support doesn't work in general, not only in UTF-8 encoded files,
"Seemed to indicate"? What are you talking about? This entire thread has had "utf-8" in the
subject line, and I have mentioned it in every single message.
Chuck
---------------------------
Charles Muller <acmuller@gol.com>
Faculty of Humanities, Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary
[http://www.acmuller.net]
Mobile Phone: 090-9310-1787
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting
[not found] ` <mailman.1033086929.4506.help-gnu-emacs@gnu.org>
@ 2002-09-27 1:42 ` Miles Bader
2002-09-27 7:06 ` Charles Muller
[not found] ` <mailman.1033110323.17834.help-gnu-emacs@gnu.org>
0 siblings, 2 replies; 32+ messages in thread
From: Miles Bader @ 2002-09-27 1:42 UTC (permalink / raw)
Charles Muller <acmuller@gol.com> writes:
> > However, your original complaint seemed to indicate that CJK
> > support doesn't work in general, not only in UTF-8 encoded files,
>
> "Seemed to indicate"? What are you talking about? This entire thread
> has had "utf-8" in the subject line, and I have mentioned it in every
> single message.
The original thread (and thus the `utf-8' in the subject header) was
about emacs `not recognizing utf-8 files,' and the message made it clear
that it was non-CJK utf-8 files that were being talked about (and that
emacs could read them correctly with a bit of assistance).
You then recommended that he use `TEI emacs,' and I asked how they did
a better job of auto-recognition.
You then said `when doing mail in Emacs (21.2), there was never any
automatic recognition of any kinds of non-Western encoding' -- no
mention of utf-8, and as far as I could see, the topic was about
encoding recognition in general, not specifically about utf-8
(the `any kinds of non-Western encoding' being a prime reason for this
-- there are things besides utf-8...).
Only later did you make a post which made it clear that you were talking
about CJK utf-8 support (which indeed emacs does not currently have).
A hint for using netnews: Topics drift, and subject headers are often
inaccurate in followup postings (and so people tend to disregard them).
-Miles
--
Ich bin ein Virus. Mach' mit und kopiere mich in Deine .signature.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting
2002-09-27 1:42 ` Miles Bader
@ 2002-09-27 7:06 ` Charles Muller
[not found] ` <mailman.1033110323.17834.help-gnu-emacs@gnu.org>
1 sibling, 0 replies; 32+ messages in thread
From: Charles Muller @ 2002-09-27 7:06 UTC (permalink / raw)
Cc: help-gnu-emacs
Thanks to Miles for his interpretation of the course of the UTF-8 thread.
Most importantly, it has been clearly established that Emacs
does not support CJK in utf-8. That's the point I wanted to make. Hopefully
this shortcoming will be addressed in future releases.
Chuck
---------------------------
Charles Muller <acmuller@gol.com>
Faculty of Humanities, Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary
[http://www.acmuller.net]
Mobile Phone: 090-9310-1787
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting
[not found] ` <mailman.1033110323.17834.help-gnu-emacs@gnu.org>
@ 2002-09-27 9:07 ` Miles Bader
2002-09-27 11:56 ` Kai Großjohann
1 sibling, 0 replies; 32+ messages in thread
From: Miles Bader @ 2002-09-27 9:07 UTC (permalink / raw)
Charles Muller <acmuller@gol.com> writes:
> Most importantly, it has been clearly established that Emacs
> does not support CJK in utf-8. That's the point I wanted to make. Hopefully
> this shortcoming will be addressed in future releases.
That's certainly an important feature to be added (and it will be,
though perhaps not until the full `internal unicode' version of emacs;
I'm not entirely cognizant of the various issues involved). However,
it's really tangential to original poster's question -- how to make
emacs recognize (non CJK for the moment) utf-8 files more intelligently.
That, presumably, is a much simpler thing to fix.
-Miles
--
"1971 pickup truck; will trade for guns"
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting
[not found] ` <mailman.1033110323.17834.help-gnu-emacs@gnu.org>
2002-09-27 9:07 ` Miles Bader
@ 2002-09-27 11:56 ` Kai Großjohann
2002-09-27 14:10 ` Charles Muller
[not found] ` <mailman.1033135767.32171.help-gnu-emacs@gnu.org>
1 sibling, 2 replies; 32+ messages in thread
From: Kai Großjohann @ 2002-09-27 11:56 UTC (permalink / raw)
Charles Muller <acmuller@gol.com> writes:
> Most importantly, it has been clearly established that Emacs
> does not support CJK in utf-8. That's the point I wanted to make. Hopefully
> this shortcoming will be addressed in future releases.
Maybe it is sufficient to install Mule-UCS? I guess that TEI Emacs
is Emacs with Mule-UCS pre-installed (plus some other packages).
kai
--
~/.signature is: umop ap!sdn (Frank Nobis)
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting
2002-09-27 11:56 ` Kai Großjohann
@ 2002-09-27 14:10 ` Charles Muller
[not found] ` <mailman.1033135767.32171.help-gnu-emacs@gnu.org>
1 sibling, 0 replies; 32+ messages in thread
From: Charles Muller @ 2002-09-27 14:10 UTC (permalink / raw)
Cc: help-gnu-emacs
Kai wrote:
> Maybe it is sufficient to install Mule-UCS? I guess that TEI Emacs
> is Emacs with Mule-UCS pre-installed (plus some other packages).
TEI-Emacs does install Mule-UCS, but the reason for its ability to do what
it does must be more than that, because I always install Mule-UCS with my
Emacs, and they never render CJK fonts in Unicode until I install the TEI package. Since all of my
internet publication and data compilation has to be done in Unicode, that's
always the first thing I check. But I don't know enough about Lisp
programming to tell you exactly *what* the TEI package does to get this
working. No doubt Sebastian Rahtz, Christian Wittern, and some of the others
who wrote the package would be happy to tell you what the key routines are.
The first priority of the TEI people is to make sure that the SGML/XML/HTML
modes are working more precisely and comprehensively than in the standard
Emacs package,
since the target audience is mainly humanities scholars who are using
TEI-XML to mark up literary texts. For example, the way the PSGML is set up
in the standard package, it is hard to get it to determine the difference
between XML and SGML. They have also added a whole array of DTD's for
various purposes, including distinctions in XHTML/strict/transitional. There
is also an XSLT mode added that allows for adjustments and debugging.
Then, on top of that, because there are so many of us working with mixed
international scripts (including CJK), apparently someone decided to figure
out how to get all the fonts properly recognized.
I am guessing that part of the problem facing the standard installation of
Emacs is that with any other traditional encoding outside of Unicode, such
as Big5, JIS, or KSC, you always have at least one full font set that is
traditionally mapped to the encoding. With Unicode, I don't know of a font
that is designed to work readily in Linux/Emacs, that covers all codepoints
(the way MS Arial Unicode does in Windows, for example). So a function needs
to be added which goes through the document and properly plugs in fonts for
each given codepoint. I am not well-enough versed at the technical end to be
able to explain how they have accomplished this over in Oxford.
I noticed a good bit of negative reaction toward TEI-Emacs when I first
mentioned it, where people expressed alarm about the TEI people not caring
about the about the GPL and not reporting to the GNU development team. I
think that these concerns come as a result of people not really checking
into what the package is, and what it does. It is not a new version of
Emacs, such as XEmacs. It is simply an add-on, that contains mode
enhancements, and some of its own new modes--just the way people are
accustomed to adding on calendar modes, e-mail packages, or whatever.
I know many of the dedicated people in the Text Encoding Initiative very
well, and there is not a bunch around who are more concerned about free
software and donating code. But when you really need to get a certain type
of application set up for a certain use, I don't think you can just write to
the GNU development team and then wait for a future version for it to be
implemented. And after all, the Lisp code for TEI-Emacs is just as openly
available as any other development based on Emacs. And they have also made a
concerted effort to have their add-on support GNU Emacs, rather than XEmacs,
so I really see it as a very positive development that should be learned
from, rather than disparaged.
Chuck
---------------------------
Charles Muller <acmuller@gol.com>
Faculty of Humanities, Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary
[http://www.acmuller.net]
Mobile Phone: 090-9310-1787
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting
[not found] ` <mailman.1033135767.32171.help-gnu-emacs@gnu.org>
@ 2002-09-27 14:41 ` Miles Bader
2002-09-27 15:54 ` Stefan Monnier <foo@acm.com>
1 sibling, 0 replies; 32+ messages in thread
From: Miles Bader @ 2002-09-27 14:41 UTC (permalink / raw)
Charles Muller <acmuller@gol.com> writes:
> I noticed a good bit of negative reaction toward TEI-Emacs when I
> first mentioned it, where people expressed alarm about the TEI people
> not caring about the about the GPL and not reporting to the GNU
> development team. I think that these concerns come as a result of
> people not really checking into what the package is, and what it does.
> It is not a new version of Emacs, such as XEmacs. It is simply an
> add-on, that contains mode enhancements, and some of its own new
> modes--just the way people are accustomed to adding on calendar modes,
> e-mail packages, or whatever.
It's perfectly fine for groups such as the TEI to have such add-on
packages, and it's certainly not necessary for them to `write to the GNU
development team and then wait for a future version for it to be
implemented.'
However, for people who have developed big packages, I think it would be
a good thing to try to maintain some sort of `loose contact' (and maybe
they have done so, I don't know), just so that people know what's out
there, and perhaps can avoid duplicating the work they've done. If
some of their stuff is of general utility, often it can be merged into
the emacs distribution, which will lessen the maintenance burden.
For instance the problem with unicode font coverage you mentioned is a
apparently a big annoyance for the people developing the `real' unicode
version of emacs.
I think that in the case of many big packages (e.g. `semantic'), there's
a general awareness in the emacs community that they exist, so formal
contact isn't necassary. I hadn't heard of the TEI work though.
-Miles
--
We live, as we dream -- alone....
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: How to make emacs auto-recognize utf-8 encoded files upon visiting
[not found] ` <mailman.1033135767.32171.help-gnu-emacs@gnu.org>
2002-09-27 14:41 ` Miles Bader
@ 2002-09-27 15:54 ` Stefan Monnier <foo@acm.com>
1 sibling, 0 replies; 32+ messages in thread
From: Stefan Monnier <foo@acm.com> @ 2002-09-27 15:54 UTC (permalink / raw)
> I noticed a good bit of negative reaction toward TEI-Emacs when I first
> mentioned it, where people expressed alarm about the TEI people not caring
> about the about the GPL and not reporting to the GNU development team.
I didn't react negatively when I asked if they could get (and keep) in touch
with us. I just think it'd be good for both us and them. Almost every time
I bump into such a package as TEI-Emacs which I had never heard of I feel
"what a pity: if I don't know about it despite all the time I spend around
Emacs, who will?".
Also the code in such packages often does both strange/wrong things (either
because of misunderstandings about how some functionality works or
because of a lack of some functionality) and great things (which could
benefit many more people than users of that package).
In either case, it would have been great if they had complained here
(for example) about the problems and shortcomings they encountered.
Stefan
^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2002-09-27 15:54 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-09-23 16:39 How to make emacs auto-recognize utf-8 encoded files upon visiting Gerald Wildgruber
2002-09-23 23:35 ` Jesper Harder
2002-09-24 3:29 ` Charles Muller
[not found] ` <mailman.1032838300.26368.help-gnu-emacs@gnu.org>
2002-09-24 6:27 ` Miles Bader
2002-09-24 8:59 ` Charles Muller
2002-09-24 15:12 ` Eli Zaretskii
2002-09-25 6:45 ` Charles Muller
2002-09-25 6:55 ` Eli Zaretskii
2002-09-25 8:07 ` Charles Muller
2002-09-25 8:33 ` Charles Muller
2002-09-26 4:42 ` Eli Zaretskii
2002-09-26 7:00 ` Charles Muller
2002-09-26 16:05 ` Eli Zaretskii
2002-09-27 0:36 ` Charles Muller
[not found] ` <mailman.1033086929.4506.help-gnu-emacs@gnu.org>
2002-09-27 1:42 ` Miles Bader
2002-09-27 7:06 ` Charles Muller
[not found] ` <mailman.1033110323.17834.help-gnu-emacs@gnu.org>
2002-09-27 9:07 ` Miles Bader
2002-09-27 11:56 ` Kai Großjohann
2002-09-27 14:10 ` Charles Muller
[not found] ` <mailman.1033135767.32171.help-gnu-emacs@gnu.org>
2002-09-27 14:41 ` Miles Bader
2002-09-27 15:54 ` Stefan Monnier <foo@acm.com>
2002-09-25 9:21 ` Charles Muller
2002-09-25 9:26 ` Charles Muller
2002-09-25 9:41 ` Charles Muller
[not found] ` <mailman.1032936261.7964.help-gnu-emacs@gnu.org>
2002-09-25 8:23 ` Miles Bader
2002-09-25 14:55 ` Stefan Monnier <foo@acm.com>
2002-09-24 19:05 ` tramp Roger Mason
[not found] ` <mailman.1032848900.31556.help-gnu-emacs@gnu.org>
2002-09-24 8:26 ` How to make emacs auto-recognize utf-8 encoded files upon visiting A. Lucien Meyers
2002-09-24 11:45 ` auto-recognize utf-8 encoded files upon visiting: solved (sort of...) Gerald Wildgruber
2002-09-24 12:39 ` Charles Muller
[not found] ` <mailman.1032871109.14505.help-gnu-emacs@gnu.org>
2002-09-25 14:28 ` A. L. Meyers
2002-09-24 18:57 ` How to make emacs auto-recognize utf-8 encoded files upon visiting Dominic Cronin
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.