all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Can't make emacs use utf-8?? (on WinXP)
@ 2003-06-02 23:55 Bjoern
  2003-06-03  7:19 ` Jason Rumney
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Bjoern @ 2003-06-02 23:55 UTC (permalink / raw)


Hello,

I've now installed emacs 21.3 and set the Mule language environment to 
UTF-8. So mule-utf-8 appears to be first in the priority order for 
recognizing coding systems when reading files, and also the default 
coding system for new files.

But I can't seem to convert my existing files. If I convert them to 
utf-8 (or mule-utf-8 ?) with C-x RET f utf-8 and save them, their 
encoding is still 'undecided-dos' when I reopen them. Except for one 
html File, where it works somehow (the others are .jsp and .java files, 
but I've also failed with another html file). When I create a text file 
just containing 'abc', it also works (recognised as utf-8 after 
reopening). However I tried writing a HelloWorld.java from scratch, and 
again it didn't work.

I'm really confused - how can undecided-dos happen at all, if utf-8 is 
first in the priority list?

Many thanks in advance for any help!


Bjoern


P.S.: Here is the output of M-x describe-current-coding:

Coding system for saving this buffer:
   - -- undecided-dos
Default coding system (for new files):
   u -- mule-utf-8-dos
Coding system for keyboard input:
   nil
Coding system for terminal output:
   1 -- iso-latin-1 (alias: iso-8859-1 latin-1)
Defaults for subprocess I/O:
   decoding: u -- mule-utf-8-dos
   encoding: u -- mule-utf-8-dos

Priority order for recognizing coding systems when reading files:
   1. mule-utf-8 (alias: utf-8)
   2. iso-latin-1 (alias: iso-8859-1 latin-1)
   3. iso-2022-jp (alias: junet)
   4. iso-2022-7bit
   5. iso-2022-7bit-lock (alias: iso-2022-int-1)
   6. iso-2022-8bit-ss2
   7. emacs-mule
   8. raw-text
   9. japanese-shift-jis (alias: shift_jis sjis)
   10. chinese-big5 (alias: big5 cn-big5)
   11. no-conversion (alias: binary)

   Other coding systems cannot be distinguished automatically
   from these, and therefore cannot be recognized automatically
   with the present coding system priorities.

   The followings are decoded correctly but recognized as 
iso-2022-7bit-lock:
     iso-2022-7bit-ss2 iso-2022-7bit-lock-ss2 iso-2022-cn iso-2022-cn-ext
     iso-2022-jp-2 iso-2022-kr

Particular coding systems specified for certain file names:

   OPERATION	TARGET PATTERN		CODING SYSTEM(s)
   ---------	--------------		----------------
   File I/O      "\\.elc\\'"             (emacs-mule . emacs-mule)
                 "\\(\\`\\|/\\)loaddefs.el\\'"
                                         (raw-text . raw-text-unix)
                 "\\.tar\\'"             (no-conversion . no-conversion)
                 ""                      find-buffer-file-type-coding-system
   Process I/O	nothing specified
   Network I/O	nothing specified

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Can't make emacs use utf-8?? (on WinXP)
  2003-06-02 23:55 Can't make emacs use utf-8?? (on WinXP) Bjoern
@ 2003-06-03  7:19 ` Jason Rumney
  2003-06-04 15:35   ` Stefan Monnier
  2003-06-03 10:16 ` Benjamin Riefenstahl
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Jason Rumney @ 2003-06-03  7:19 UTC (permalink / raw)


Bjoern <p2@blinker.net> writes:

> But I can't seem to convert my existing files. If I convert them to
> utf-8 (or mule-utf-8 ?) with C-x RET f utf-8 and save them, their
> encoding is still 'undecided-dos' when I reopen them.

The encoding does not need to be decided until you insert non-ASCII
characters into the buffer.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Can't make emacs use utf-8?? (on WinXP)
  2003-06-02 23:55 Can't make emacs use utf-8?? (on WinXP) Bjoern
  2003-06-03  7:19 ` Jason Rumney
@ 2003-06-03 10:16 ` Benjamin Riefenstahl
  2003-06-04  0:26   ` Bjoern
  2003-06-03 18:27 ` Eli Zaretskii
  2003-06-05 14:14 ` Stefan Monnier
  3 siblings, 1 reply; 14+ messages in thread
From: Benjamin Riefenstahl @ 2003-06-03 10:16 UTC (permalink / raw)


Hi Bjoern,


Bjoern <p2@blinker.net> writes:
> But I can't seem to convert my existing files. If I convert them to
> utf-8 (or mule-utf-8 ?) with C-x RET f utf-8 and save them, their
> encoding is still 'undecided-dos' when I reopen them.

Are there any characters in those files that would need converting?

> Except for one html File, where it works somehow (the others are
> .jsp and .java files, but I've also failed with another html
> file). When I create a text file just containing 'abc', it also
> works (recognised as utf-8 after reopening).

A text file that just has ASCII characters is just ASCII.  Emacs has
assigned your default encoding, there is nothing in there to detect
UTF-8 or any other (8-bit) encoding.

> However I tried writing a HelloWorld.java from scratch, and again it
> didn't work.

Java files should be plain ASCII, if at all possible.  HTML files can
be in UTF-8, but than they should declare the encoding in a <meta>
element.

> I'm really confused - how can undecided-dos happen at all, if utf-8
> is first in the priority list?

Good question.  I would expect that for plain ASCII, but you say that
a plain ASCII file got utf-8 assigned on opening, so that can't be it.
There may be mode-specific specialties.  Actually there *should* be,
as in the case of HTML or XML, but AFAIK, Emacs doesn't have much of
those yet.


so long, benny

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Can't make emacs use utf-8?? (on WinXP)
  2003-06-02 23:55 Can't make emacs use utf-8?? (on WinXP) Bjoern
  2003-06-03  7:19 ` Jason Rumney
  2003-06-03 10:16 ` Benjamin Riefenstahl
@ 2003-06-03 18:27 ` Eli Zaretskii
  2003-06-05 14:14 ` Stefan Monnier
  3 siblings, 0 replies; 14+ messages in thread
From: Eli Zaretskii @ 2003-06-03 18:27 UTC (permalink / raw)


> From: Bjoern <p2@blinker.net>
> Newsgroups: gnu.emacs.help
> Date: Tue, 03 Jun 2003 01:55:33 +0200
> 
> I'm really confused - how can undecided-dos happen at all, if utf-8 is 
> first in the priority list?

If the buffer doesn't have anything but 7-bit ASCII, Emacs will not
decide about its encoding, as it has no crystal ball to guess what
non-ASCII characters would you wish to insert there.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Can't make emacs use utf-8?? (on WinXP)
  2003-06-03 10:16 ` Benjamin Riefenstahl
@ 2003-06-04  0:26   ` Bjoern
  2003-06-04 12:52     ` Benjamin Riefenstahl
  0 siblings, 1 reply; 14+ messages in thread
From: Bjoern @ 2003-06-04  0:26 UTC (permalink / raw)


Benjamin Riefenstahl wrote:
[...]

>But I can't seem to convert my existing files. If I convert them to
>>utf-8 (or mule-utf-8 ?) with C-x RET f utf-8 and save them, their
>>encoding is still 'undecided-dos' when I reopen them.
>  
> Are there any characters in those files that would need converting?

Sometimes. I guess I can with the current state now - if press some 
non-ASCII keys, the encoding switched to utf-8 immediately. It turned 
out the real problem was with my webserver (Tomcat) and the JSTL that 
refuse to set a UTF encoding.

[...]

>>However I tried writing a HelloWorld.java from scratch, and again it
>>didn't work.
> 
> 
> Java files should be plain ASCII, if at all possible.  HTML files can
> be in UTF-8, but than they should declare the encoding in a <meta>
> element.

But wasn't Java the first programming language to promote Unicode? I 
haven't tried it yet, as untill yesterday I really stuck to ASCII.

[...]

>>I'm really confused - how can undecided-dos happen at all, if utf-8
>>is first in the priority list?
> 
> 
> Good question.  I would expect that for plain ASCII, but you say that
> a plain ASCII file got utf-8 assigned on opening, so that can't be it.

That's still puzzling me, but it's not so important at the moment.

Thanks for all the help, everyone!


Bjoern

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Can't make emacs use utf-8?? (on WinXP)
  2003-06-04  0:26   ` Bjoern
@ 2003-06-04 12:52     ` Benjamin Riefenstahl
  2003-06-04 13:16       ` Bjoern
  0 siblings, 1 reply; 14+ messages in thread
From: Benjamin Riefenstahl @ 2003-06-04 12:52 UTC (permalink / raw)


Hi Bjoern,


Bjoern <p2@blinker.net> writes:
> But wasn't Java the first programming language to promote Unicode?
> I haven't tried it yet, as untill yesterday I really stuck to ASCII.

The first thing for Unicode to rule the world (;-)) is that
programmers need to start writing programs that can do Unicode,
e.g. in Java.  Only when it is a common thing can you actually use it
*during* programming.  We are not there yet, there are still quite a
number of other encodings around, and the Java compiler has to cope
with them as any other tool.

So while Java accepts non-ASCII in a number of places (including
identifiers, I believe) for portability reasons you shouldn't just
write them without precautions IMO.  You can either use UCNs (those
\uXXXX codes), or you can use the native2ascii tool that comes with
the JDK to translate to UCNs, or you can just avoid using non-ASCII
characters in code altogether.  The last one is the easiest.

There are even those, like me, that believe that the language of
identifiers and comments should be simple English anyway, to maximize
reusability.


so long, benny

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Can't make emacs use utf-8?? (on WinXP)
  2003-06-04 12:52     ` Benjamin Riefenstahl
@ 2003-06-04 13:16       ` Bjoern
  2003-06-04 14:19         ` Robin Hu
                           ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Bjoern @ 2003-06-04 13:16 UTC (permalink / raw)


Benjamin Riefenstahl wrote:

> Hi Bjoern,
> 
> 
> Bjoern <p2@blinker.net> writes:
> 
>>But wasn't Java the first programming language to promote Unicode?
>>I haven't tried it yet, as untill yesterday I really stuck to ASCII.
> 
> 
> The first thing for Unicode to rule the world (;-)) is that

Can't wait ;-)

[...]


> There are even those, like me, that believe that the language of
> identifiers and comments should be simple English anyway, to maximize
> reusability.

True, I wasn't even thinking about that, but what about

System.out.println("some non-ASCII text here");

?


Bjoern

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Can't make emacs use utf-8?? (on WinXP)
  2003-06-04 13:16       ` Bjoern
@ 2003-06-04 14:19         ` Robin Hu
  2003-06-04 14:59         ` Benjamin Riefenstahl
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Robin Hu @ 2003-06-04 14:19 UTC (permalink / raw)


>>>>> "Bjoern" == Bjoern  <p2@blinker.net> writes:

    >>> Unicode?  I haven't tried it yet, as untill yesterday I really
    >>> stuck to ASCII.
    >> The first thing for Unicode to rule the world (;-)) is that

    One coding to rule them all, one coding to find them. ;-)

    >> There are even those, like me, that believe that the language of
    >> identifiers and comments should be simple English anyway, to
    >> maximize reusability.

    Bjoern> True, I wasn't even thinking about that, but what about

    Bjoern> System.out.println("some non-ASCII text here");

    This should be seperated if it's a part of UI as a resource file ,
    or, be simple english for maximize reusability, I think.

-- 
The goal of science is to build better mousetraps.  The goal of nature
is to build better mice.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Can't make emacs use utf-8?? (on WinXP)
  2003-06-04 13:16       ` Bjoern
  2003-06-04 14:19         ` Robin Hu
@ 2003-06-04 14:59         ` Benjamin Riefenstahl
  2003-06-04 20:21         ` Jason Rumney
  2003-06-10 17:53         ` Piet van Oostrum
  3 siblings, 0 replies; 14+ messages in thread
From: Benjamin Riefenstahl @ 2003-06-04 14:59 UTC (permalink / raw)


Hi Bjoern,


> Benjamin Riefenstahl wrote:
>> There are even those, like me, that believe that the language of
>> identifiers and comments should be simple English anyway, to
>> maximize reusability.

Bjoern <p2@blinker.net> writes:
> True, I wasn't even thinking about that, but what about
>
> System.out.println("some non-ASCII text here");

For my own code I want a universal version of the text in the code
(that will just be in English) so that I have a fallback in case of
problems, and than I have localized versions of it in ResourceBundles
or whatever my framework uses for internationalization / localization.

Instead of having to get the encodings right for all my code, just the
ResourceBundle files need to be in UTF-16 or UTF-8 or whatever works
best.


so long, benny

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Can't make emacs use utf-8?? (on WinXP)
  2003-06-03  7:19 ` Jason Rumney
@ 2003-06-04 15:35   ` Stefan Monnier
  0 siblings, 0 replies; 14+ messages in thread
From: Stefan Monnier @ 2003-06-04 15:35 UTC (permalink / raw)


> The encoding does not need to be decided until you insert non-ASCII
> characters into the buffer.

It doesn't even need to be decided at that point yet.
It only needs to be decided when you save the file with those
non-ASCII chars in it.


        Stefan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Can't make emacs use utf-8?? (on WinXP)
  2003-06-04 13:16       ` Bjoern
  2003-06-04 14:19         ` Robin Hu
  2003-06-04 14:59         ` Benjamin Riefenstahl
@ 2003-06-04 20:21         ` Jason Rumney
  2003-06-10 17:53         ` Piet van Oostrum
  3 siblings, 0 replies; 14+ messages in thread
From: Jason Rumney @ 2003-06-04 20:21 UTC (permalink / raw)


Bjoern <p2@blinker.net> writes:

> > There are even those, like me, that believe that the language of
> > identifiers and comments should be simple English anyway, to maximize
> > reusability.
> 
> True, I wasn't even thinking about that, but what about
> 
> System.out.println("some non-ASCII text here");

This is better suited to a Java newsgroup, but the answer is it
depends.

By default, the java compiler uses the user's default locale to guess
the encoding, which in a large enough team is somewhat random, but can
be forced to use a particular encoding by using a compiler
switch.  There is also some randomness in what encoding other
people's editors will write code out as.

So if you are a team of one, you can probably manage to set your
environment up so that non-ASCII just works, but as soon as other
people start working on the code you are opening a can of worms by
allowing non-ASCII characters in the source without encoding them
as \uNNNN.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Can't make emacs use utf-8?? (on WinXP)
  2003-06-02 23:55 Can't make emacs use utf-8?? (on WinXP) Bjoern
                   ` (2 preceding siblings ...)
  2003-06-03 18:27 ` Eli Zaretskii
@ 2003-06-05 14:14 ` Stefan Monnier
  2003-06-06  8:58   ` Bjoern
  3 siblings, 1 reply; 14+ messages in thread
From: Stefan Monnier @ 2003-06-05 14:14 UTC (permalink / raw)


>>>>> "Bjoern" == Bjoern  <p2@blinker.net> writes:
> their encoding is still 'undecided-dos' when I reopen them.

Does that just bother you philosophically, or does it create a real
problem somewhere ?
I feel like you're worrying about something irrelevant.


        Stefan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Can't make emacs use utf-8?? (on WinXP)
  2003-06-05 14:14 ` Stefan Monnier
@ 2003-06-06  8:58   ` Bjoern
  0 siblings, 0 replies; 14+ messages in thread
From: Bjoern @ 2003-06-06  8:58 UTC (permalink / raw)


Stefan Monnier wrote:
>>>>>>"Bjoern" == Bjoern  <p2@blinker.net> writes:
>>
>>their encoding is still 'undecided-dos' when I reopen them.
>  
> Does that just bother you philosophically, or does it create a real
> problem somewhere ?
> I feel like you're worrying about something irrelevant.

I didn't understand encodings well enough at the time to realise that it 
doesn't matter. Although I still think that behaviour is a bit confusing.


Bjoern

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Can't make emacs use utf-8?? (on WinXP)
  2003-06-04 13:16       ` Bjoern
                           ` (2 preceding siblings ...)
  2003-06-04 20:21         ` Jason Rumney
@ 2003-06-10 17:53         ` Piet van Oostrum
  3 siblings, 0 replies; 14+ messages in thread
From: Piet van Oostrum @ 2003-06-10 17:53 UTC (permalink / raw)


>>>>> Bjoern <p2@blinker.net> (B) wrote:

B> True, I wasn't even thinking about that, but what about

B> System.out.println("some non-ASCII text here");

Well I tried that inside emacs (with utf-8 as default):

class testuni {
    public static void main(String args[]){
	System.out.println("Eén überhaupt € å…æÆ\n");
    }
}

TEMP> javac testuni.java
TEMP> java testuni
Eén überhaupt € å…æÆ

-- 
Piet van Oostrum <piet@cs.uu.nl>
URL: http://www.cs.uu.nl/~piet [PGP]
Private email: P.van.Oostrum@hccnet.nl

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2003-06-10 17:53 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-06-02 23:55 Can't make emacs use utf-8?? (on WinXP) Bjoern
2003-06-03  7:19 ` Jason Rumney
2003-06-04 15:35   ` Stefan Monnier
2003-06-03 10:16 ` Benjamin Riefenstahl
2003-06-04  0:26   ` Bjoern
2003-06-04 12:52     ` Benjamin Riefenstahl
2003-06-04 13:16       ` Bjoern
2003-06-04 14:19         ` Robin Hu
2003-06-04 14:59         ` Benjamin Riefenstahl
2003-06-04 20:21         ` Jason Rumney
2003-06-10 17:53         ` Piet van Oostrum
2003-06-03 18:27 ` Eli Zaretskii
2003-06-05 14:14 ` Stefan Monnier
2003-06-06  8:58   ` Bjoern

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.