* Re: Convert UTF-8
2008-12-17 8:41 ` YOUNG
@ 2008-12-17 9:59 ` Thierry Volpiatto
2008-12-17 11:17 ` Giorgos Keramidas
2008-12-17 12:04 ` Xah Lee
2 siblings, 0 replies; 10+ messages in thread
From: Thierry Volpiatto @ 2008-12-17 9:59 UTC (permalink / raw)
To: help-gnu-emacs
YOUNG <breadncup@gmail.com> writes:
> On Dec 16, 11:54 pm, Harald Hanche-Olsen <han...@math.ntnu.no> wrote:
>> + Andreas Politz <poli...@fh-trier.de>:
>>
>> > YOUNG wrote:
>>
>> >> I have a Emacs 22.3.1 for Windows XP, and there is a file encoded in
>> >> ASCII. I am trying to read the file and convert it to UTF-8 with
>> >> emacs.
>>
>> > If I am not mistaken, converting a ASCII file to UTF-8 is an identity
>> > operation, since the later is backwards compatible to the former. So
>> > there would be nothing to convert.
>>
>> You are not at all mistaken of course, but many people take "ASCII" to
>> mean their favourite eight bit character set (typically Latin 1 or 9 in
>> western Europe).
>>
>> But since the OP reports no change to his files, maybe they really were
>> proper ASCII to begin with. Or maybe he is confused about how to make
>> emacs use UTF-8 when loading the file? If so, he could do worse than
>> read the emacs info file, node "Recognize coding".
>>
>> --
>> * Harald Hanche-Olsen <URL:http://www.math.ntnu.no/~hanche/>
>> - It is undesirable to believe a proposition
>> when there is no ground whatsoever for supposing it is true.
>> -- Bertrand Russell
>
> Well, I have no problem to load UTF-8 file with emacs at all.
>
> The problem is that emacs is not able to write UTF-8 at all.
>
> For example, if a file is encoded in ASCII (or, CP437, or ISO 8859 or
> Latin 1 to 9; there are various aliases to indicating of it, but you
> already know what it means.), I set it up with M-x set-buffer-file-
> coding-system for writing utf-8 encoding. And, write (or save) it.
> After that, exit the emacs and re-run it again, and try to read the
> saved file to be expected UTF-8 encoding, but it reads again in ASCII.
> It does not mean emacs can't read utf-8, but the file itself is not
> encoded UTF-8. I check the file's encoding system with other
> application like NotePAD++ or other editors, and all say the file is
> still ASCII mode even though I write it as utf-8 in emacs.
>
> Again, there is no problem in reading utf-8. When a file is encoded
> utf-8 correctly, emacs reads/writes it in utf-8. It's good. However,
> emacs is not able to write utf-8 if the file is encoded in ASCII. It
> only writes in ASCII encode no matter how I do "set-buffer-file-coding-
> system"
>
> So, if somebody knows this issue and how to write utf-8 correctly when
> a file is encoded in ISO8859 (or CP437 or ASCII), and if you share the
> information, it would be appreciated.
I was using iso-8859-15 before switching my system to utf-8.
I just add to my files: (not -*- utf-8 encoding -*-)
,----
| # -*- coding: utf-8 -*-
`----
instead of
,----
| # -*- coding: iso-8859-15 -*-
`----
--
A + Thierry Volpiatto
Location: Saint-Cyr-Sur-Mer - France
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Convert UTF-8
2008-12-17 8:41 ` YOUNG
2008-12-17 9:59 ` Thierry Volpiatto
@ 2008-12-17 11:17 ` Giorgos Keramidas
2008-12-17 12:04 ` Xah Lee
2 siblings, 0 replies; 10+ messages in thread
From: Giorgos Keramidas @ 2008-12-17 11:17 UTC (permalink / raw)
To: help-gnu-emacs
On Wed, 17 Dec 2008 00:41:47 -0800 (PST), YOUNG <breadncup@gmail.com> wrote:
> Well, I have no problem to load UTF-8 file with emacs at all.
>
> The problem is that emacs is not able to write UTF-8 at all.
>
> For example, if a file is encoded in ASCII (or, CP437, or ISO 8859 or
> Latin 1 to 9; there are various aliases to indicating of it, but you
> already know what it means.), I set it up with M-x set-buffer-file-
> coding-system for writing utf-8 encoding. And, write (or save) it.
> After that, exit the emacs and re-run it again, and try to read the
> saved file to be expected UTF-8 encoding, but it reads again in ASCII.
> It does not mean emacs can't read utf-8, but the file itself is not
> encoded UTF-8. I check the file's encoding system with other
> application like NotePAD++ or other editors, and all say the file is
> still ASCII mode even though I write it as utf-8 in emacs.
ASCII contains only 7-bit characters. All the characters of the 7-bit
ASCII character set map to themselves in the UTF-8 coding system.
This means that when a file contains only characters from the ASCII
character set no conversion at all is needed from UTF-8 to ASCII or vice
versa.
If you set the buffer-file-coding system to UTF-8 *and* type some text
that requires at least 8-bits to be represented correctly in in UTF-8,
then the file will be saved in UTF-8.
> Again, there is no problem in reading utf-8. When a file is encoded
> utf-8 correctly, emacs reads/writes it in utf-8. It's good. However,
> emacs is not able to write utf-8 if the file is encoded in ASCII. It
> only writes in ASCII encode no matter how I do
> "set-buffer-file-coding- system"
>
> So, if somebody knows this issue and how to write utf-8 correctly when
> a file is encoded in ISO8859 (or CP437 or ASCII), and if you share the
> information, it would be appreciated.
CP437 is very different from plain ASCII. It contains 8-bit characters
and there are other differences in the 0x00 - 0x1F code range. If you
ignore the 0x00-0x1F character differences you might be able to say that
CP437 is a 'superset' of ASCII, but they are not the same thing.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Convert UTF-8
2008-12-17 8:41 ` YOUNG
2008-12-17 9:59 ` Thierry Volpiatto
2008-12-17 11:17 ` Giorgos Keramidas
@ 2008-12-17 12:04 ` Xah Lee
2008-12-18 8:35 ` YOUNG
2 siblings, 1 reply; 10+ messages in thread
From: Xah Lee @ 2008-12-17 12:04 UTC (permalink / raw)
To: help-gnu-emacs
On Dec 17, 12:41 am, YOUNG <breadn...@gmail.com> wrote:
> Well, I have no problem to load UTF-8 file with emacs at all.
>
> The problem is that emacs is not able to write UTF-8 at all.
>
> For example, if a file is encoded in ASCII (or, CP437, or ISO 8859 or
> Latin 1 to 9; there are various aliases to indicating of it, but you
> already know what it means.), I set it up with M-x set-buffer-file-
> coding-system for writing utf-8 encoding. And, write (or save) it.
> After that, exit the emacs and re-run it again, and try to read the
> saved file to be expected UTF-8 encoding, but it reads again in ASCII.
> It does not mean emacs can't read utf-8, but the file itself is not
> encoded UTF-8. I check the file's encoding system with other
> application like NotePAD++ or other editors, and all say the file is
> still ASCII mode even though I write it as utf-8 in emacs.
>
> Again, there is no problem in reading utf-8. When a file is encoded
> utf-8 correctly, emacs reads/writes it in utf-8. It's good. However,
> emacs is not able to write utf-8 if the file is encoded in ASCII. It
> only writes in ASCII encode no matter how I do "set-buffer-file-coding-
> system"
>
> So, if somebody knows this issue and how to write utf-8 correctly when
> a file is encoded in ISO8859 (or CP437 or ASCII), and if you share the
> information, it would be appreciated.
>
> Thanks,
as other have mentioned, utf-8 is just a super set of ascii, so files
encoded in either are identical.
You mentioned ISO8859, which is not ascii. I read your 2 posts, but
don't quite understand what you wanted.
For some unicode with emacs tips, you might checkout:
• Emacs and Unicode Tips
http://xahlee.org/emacs/emacs_n_unicode.html
You might also beefup understanding of char encoding:
http://en.wikipedia.org/wiki/ISO8859
http://en.wikipedia.org/wiki/ASCII
http://en.wikipedia.org/wiki/UTF-8
Xah
∑ http://xahlee.org/
☄
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Convert UTF-8
2008-12-17 12:04 ` Xah Lee
@ 2008-12-18 8:35 ` YOUNG
2008-12-18 14:56 ` Harald Hanche-Olsen
0 siblings, 1 reply; 10+ messages in thread
From: YOUNG @ 2008-12-18 8:35 UTC (permalink / raw)
To: help-gnu-emacs
On Dec 17, 4:04 am, Xah Lee <xah...@gmail.com> wrote:
> On Dec 17, 12:41 am, YOUNG <breadn...@gmail.com> wrote:
>
>
>
> > Well, I have no problem to load UTF-8 file with emacs at all.
>
> > The problem is that emacs is not able to write UTF-8 at all.
>
> > For example, if a file is encoded in ASCII (or, CP437, or ISO 8859 or
> > Latin 1 to 9; there are various aliases to indicating of it, but you
> > already know what it means.), I set it up with M-x set-buffer-file-
> > coding-system for writing utf-8 encoding. And, write (or save) it.
> > After that, exit the emacs and re-run it again, and try to read the
> > saved file to be expected UTF-8 encoding, but it reads again in ASCII.
> > It does not mean emacs can't read utf-8, but the file itself is not
> > encoded UTF-8. I check the file's encoding system with other
> > application like NotePAD++ or other editors, and all say the file is
> > still ASCII mode even though I write it as utf-8 in emacs.
>
> > Again, there is no problem in reading utf-8. When a file is encoded
> > utf-8 correctly, emacs reads/writes it in utf-8. It's good. However,
> > emacs is not able to write utf-8 if the file is encoded in ASCII. It
> > only writes in ASCII encode no matter how I do "set-buffer-file-coding-
> > system"
>
> > So, if somebody knows this issue and how to write utf-8 correctly when
> > a file is encoded in ISO8859 (or CP437 or ASCII), and if you share the
> > information, it would be appreciated.
>
> > Thanks,
>
> as other have mentioned, utf-8 is just a super set of ascii, so files
> encoded in either are identical.
>
> You mentioned ISO8859, which is not ascii. I read your 2 posts, but
> don't quite understand what you wanted.
>
> For some unicode with emacs tips, you might checkout:
>
> • Emacs and Unicode Tips
> http://xahlee.org/emacs/emacs_n_unicode.html
>
> You might also beefup understanding of char encoding:
>
> http://en.wikipedia.org/wiki/ISO8859http://en.wikipedia.org/wiki/ASCIIhttp://en.wikipedia.org/wiki/UTF-8
>
> Xah
> ∑http://xahlee.org/
>
> ☄
Hi,
Finally, I know what is the problem. Thank you guys for helping this
issues.
I am not expert on encoding system, though, I thank this opportunity
for me to learn it.
The problem is BOM (Byte Order Mark). In case of utf-8, it is avoided
since BOM header could cause conflict when the expected special
character is starting position like '#!' in Unix shell script.
Therefore, if there is no text written at least 8-bits to be
represented in utf-8, the text encoding is not defined or ASCII (I am
not sure if it is right term, but here, let's say it is ASCII for
convenience.) in emacs.
I could conclude emacs does not have the feature of having BOM in
utf-8. It only supports utf-8 without BOM. So, I could understand why
the text was not written in utf-8 if the text does not contain actual
utf-8 characters. If there is a text in utf-8 character and save it as
utf-8, then there is no problem in writing utf-8 without BOM.
Detailed information about unicode and BOM is found in
http://unicode.org/faq/utf_bom.html
Thank you,
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Convert UTF-8
2008-12-18 8:35 ` YOUNG
@ 2008-12-18 14:56 ` Harald Hanche-Olsen
0 siblings, 0 replies; 10+ messages in thread
From: Harald Hanche-Olsen @ 2008-12-18 14:56 UTC (permalink / raw)
To: help-gnu-emacs
+ YOUNG <breadncup@gmail.com>:
> I could conclude emacs does not have the feature of having BOM in
> utf-8. It only supports utf-8 without BOM.
Not true. But you have to put the BOM (ZERO WIDTH NO-BREAK SPACE,
really) there yourself, since otherwise as you noted (in the elided
text) it can play havoc with shell scripts etc. If you want, e.g., every
file that is visited in text mode to start with a BOM you can add a hook
function to before-save-hook that ensures this before saving.
Also, at least the emacsen I am currently using (version 23 from CVS)
will recognize an initial BOM and automagically pick the utf-8 encoding
when it sees the corresponding three bytes at the top of the file.
> Detailed information about unicode and BOM is found in
> http://unicode.org/faq/utf_bom.html
The use of zero width no-break space as a marker to indicate coding is
also widely regarded as unwise. I am too lazy to find any of the
references that will support my claim, so take it with a grain of salt
if you will.
--
* Harald Hanche-Olsen <URL:http://www.math.ntnu.no/~hanche/>
- It is undesirable to believe a proposition
when there is no ground whatsoever for supposing it is true.
-- Bertrand Russell
^ permalink raw reply [flat|nested] 10+ messages in thread