* Convert UTF-8
@ 2008-12-17 1:05 YOUNG
2008-12-17 2:27 ` Andreas Politz
2008-12-17 10:43 ` Peter Dyballa
0 siblings, 2 replies; 10+ messages in thread
From: YOUNG @ 2008-12-17 1:05 UTC (permalink / raw)
To: help-gnu-emacs
Hi,
I have a Emacs 22.3.1 for Windows XP, and there is a file encoded in
ASCII. I am trying to read the file and convert it to UTF-8 with
emacs.
I have tried
M-x set-buffer-file-coding-system
and set up utf-8 and check it has changed to 'u' in status bar, and
since buffer has changed, it shows '**' as well.
So, I write the file using "C-x s".
It seems to fine. So, I exit the emacs, and rerun the emacs again and
read the file, too. However, the file is not converted at all.
Here is when I did "describe-current-coding-system"
----------------------
Coding system for saving this buffer:
- -- undecided-dos
Default coding system (for new files):
u -- mule-utf-8 (alias: utf-8)
Coding system for keyboard input:
* -- cp1252 (alias of windows-1252)
Coding system for terminal output:
* -- cp1252 (alias of windows-1252)
Defaults for subprocess I/O:
decoding: u -- mule-utf-8-dos
encoding: u -- mule-utf-8-unix
Priority order for recognizing coding systems when reading files:
1. mule-utf-8 (alias: utf-8)
2. iso-latin-1 (alias: iso-8859-1 latin-1)
3. mule-utf-16be-with-signature (alias: utf-16be-with-signature mule-
utf-16-be utf-16-be)
4. mule-utf-16le-with-signature (alias: utf-16le-with-signature mule-
utf-16-le utf-16-le)
5. iso-2022-jp (alias: junet)
6. iso-2022-7bit
7. iso-2022-7bit-lock (alias: iso-2022-int-1)
8. iso-2022-8bit-ss2
9. emacs-mule
10. raw-text
11. japanese-shift-jis (alias: shift_jis sjis cp932)
12. chinese-big5 (alias: big5 cn-big5 cp950)
13. no-conversion
Other coding systems cannot be distinguished automatically
from these, and therefore cannot be recognized automatically
with the present coding system priorities.
The following are decoded correctly but recognized as iso-2022-7bit-
lock:
iso-2022-7bit-ss2 iso-2022-7bit-lock-ss2 iso-2022-cn iso-2022-cn-
ext iso-2022-jp-2 iso-2022-kr
Particular coding systems specified for certain file names:
OPERATION TARGET PATTERN CODING SYSTEM(s)
--------- -------------- ----------------
File I/O "\\.dz\\'" (no-conversion . no-conversion)
"\\.g?z\\(~\\|\\.~[0-9]+~\\)?\\'"
(no-conversion . no-conversion)
"\\.tgz\\'" (no-conversion . no-conversion)
"\\.tbz\\'" (no-conversion . no-conversion)
"\\.bz2\\(~\\|\\.~[0-9]+~\\)?\\'"
(no-conversion . no-conversion)
"\\.Z\\(~\\|\\.~[0-9]+~\\)?\\'"
(no-conversion . no-conversion)
"\\.elc\\'" (emacs-mule . emacs-mule)
"\\.utf\\(-8\\)?\\'" utf-8
"\\(\\`\\|/\\)loaddefs.el\\'"
(raw-text . raw-text-unix)
"\\.tar\\'" (no-conversion . no-conversion)
"\\.po[tx]?\\'\\|\\.po\\."
po-find-file-coding-system
"\\.\\(tex\\|ltx\\|dtx\\|drv\\)\\'"
latexenc-find-file-coding-system
"" find-buffer-file-type-coding-system
Process I/O "[pP][lL][iI][nN][kK]" (undecided-dos . undecided-dos)
"[cC][mM][dD][pP][rR][oO][xX][yY]"
(undecided-dos . undecided-dos)
Network I/O nothing specified
----------------------
Do you know how to convert a file to UTF-8 using emacs, please?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Convert UTF-8
2008-12-17 1:05 Convert UTF-8 YOUNG
@ 2008-12-17 2:27 ` Andreas Politz
2008-12-17 7:54 ` Harald Hanche-Olsen
2008-12-17 10:43 ` Peter Dyballa
1 sibling, 1 reply; 10+ messages in thread
From: Andreas Politz @ 2008-12-17 2:27 UTC (permalink / raw)
To: help-gnu-emacs
YOUNG wrote:
> Hi,
>
> I have a Emacs 22.3.1 for Windows XP, and there is a file encoded in
> ASCII. I am trying to read the file and convert it to UTF-8 with
> emacs.
>
If I am not mistaken, converting a ASCII file to UTF-8 is an identity operation,
since the later is backwards compatible to the former. So there would be nothing
to convert.
-ap
> I have tried
>
> M-x set-buffer-file-coding-system
>
> and set up utf-8 and check it has changed to 'u' in status bar, and
> since buffer has changed, it shows '**' as well.
>
> So, I write the file using "C-x s".
>
> It seems to fine. So, I exit the emacs, and rerun the emacs again and
> read the file, too. However, the file is not converted at all.
>
> Here is when I did "describe-current-coding-system"
>
> ----------------------
>
> Coding system for saving this buffer:
> - -- undecided-dos
>
> Default coding system (for new files):
> u -- mule-utf-8 (alias: utf-8)
>
> Coding system for keyboard input:
> * -- cp1252 (alias of windows-1252)
>
> Coding system for terminal output:
> * -- cp1252 (alias of windows-1252)
>
> Defaults for subprocess I/O:
> decoding: u -- mule-utf-8-dos
>
> encoding: u -- mule-utf-8-unix
>
>
> Priority order for recognizing coding systems when reading files:
> 1. mule-utf-8 (alias: utf-8)
> 2. iso-latin-1 (alias: iso-8859-1 latin-1)
> 3. mule-utf-16be-with-signature (alias: utf-16be-with-signature mule-
> utf-16-be utf-16-be)
> 4. mule-utf-16le-with-signature (alias: utf-16le-with-signature mule-
> utf-16-le utf-16-le)
> 5. iso-2022-jp (alias: junet)
> 6. iso-2022-7bit
> 7. iso-2022-7bit-lock (alias: iso-2022-int-1)
> 8. iso-2022-8bit-ss2
> 9. emacs-mule
> 10. raw-text
> 11. japanese-shift-jis (alias: shift_jis sjis cp932)
> 12. chinese-big5 (alias: big5 cn-big5 cp950)
> 13. no-conversion
>
> Other coding systems cannot be distinguished automatically
> from these, and therefore cannot be recognized automatically
> with the present coding system priorities.
>
> The following are decoded correctly but recognized as iso-2022-7bit-
> lock:
> iso-2022-7bit-ss2 iso-2022-7bit-lock-ss2 iso-2022-cn iso-2022-cn-
> ext iso-2022-jp-2 iso-2022-kr
>
> Particular coding systems specified for certain file names:
>
> OPERATION TARGET PATTERN CODING SYSTEM(s)
> --------- -------------- ----------------
> File I/O "\\.dz\\'" (no-conversion . no-conversion)
> "\\.g?z\\(~\\|\\.~[0-9]+~\\)?\\'"
> (no-conversion . no-conversion)
> "\\.tgz\\'" (no-conversion . no-conversion)
> "\\.tbz\\'" (no-conversion . no-conversion)
> "\\.bz2\\(~\\|\\.~[0-9]+~\\)?\\'"
> (no-conversion . no-conversion)
> "\\.Z\\(~\\|\\.~[0-9]+~\\)?\\'"
> (no-conversion . no-conversion)
> "\\.elc\\'" (emacs-mule . emacs-mule)
> "\\.utf\\(-8\\)?\\'" utf-8
> "\\(\\`\\|/\\)loaddefs.el\\'"
> (raw-text . raw-text-unix)
> "\\.tar\\'" (no-conversion . no-conversion)
> "\\.po[tx]?\\'\\|\\.po\\."
> po-find-file-coding-system
> "\\.\\(tex\\|ltx\\|dtx\\|drv\\)\\'"
> latexenc-find-file-coding-system
> "" find-buffer-file-type-coding-system
> Process I/O "[pP][lL][iI][nN][kK]" (undecided-dos . undecided-dos)
> "[cC][mM][dD][pP][rR][oO][xX][yY]"
> (undecided-dos . undecided-dos)
> Network I/O nothing specified
> ----------------------
>
> Do you know how to convert a file to UTF-8 using emacs, please?
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Convert UTF-8
2008-12-17 2:27 ` Andreas Politz
@ 2008-12-17 7:54 ` Harald Hanche-Olsen
2008-12-17 8:41 ` YOUNG
0 siblings, 1 reply; 10+ messages in thread
From: Harald Hanche-Olsen @ 2008-12-17 7:54 UTC (permalink / raw)
To: help-gnu-emacs
+ Andreas Politz <politza@fh-trier.de>:
> YOUNG wrote:
>>
>> I have a Emacs 22.3.1 for Windows XP, and there is a file encoded in
>> ASCII. I am trying to read the file and convert it to UTF-8 with
>> emacs.
>>
>
> If I am not mistaken, converting a ASCII file to UTF-8 is an identity
> operation, since the later is backwards compatible to the former. So
> there would be nothing to convert.
You are not at all mistaken of course, but many people take "ASCII" to
mean their favourite eight bit character set (typically Latin 1 or 9 in
western Europe).
But since the OP reports no change to his files, maybe they really were
proper ASCII to begin with. Or maybe he is confused about how to make
emacs use UTF-8 when loading the file? If so, he could do worse than
read the emacs info file, node "Recognize coding".
--
* Harald Hanche-Olsen <URL:http://www.math.ntnu.no/~hanche/>
- It is undesirable to believe a proposition
when there is no ground whatsoever for supposing it is true.
-- Bertrand Russell
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Convert UTF-8
2008-12-17 7:54 ` Harald Hanche-Olsen
@ 2008-12-17 8:41 ` YOUNG
2008-12-17 9:59 ` Thierry Volpiatto
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: YOUNG @ 2008-12-17 8:41 UTC (permalink / raw)
To: help-gnu-emacs
On Dec 16, 11:54 pm, Harald Hanche-Olsen <han...@math.ntnu.no> wrote:
> + Andreas Politz <poli...@fh-trier.de>:
>
> > YOUNG wrote:
>
> >> I have a Emacs 22.3.1 for Windows XP, and there is a file encoded in
> >> ASCII. I am trying to read the file and convert it to UTF-8 with
> >> emacs.
>
> > If I am not mistaken, converting a ASCII file to UTF-8 is an identity
> > operation, since the later is backwards compatible to the former. So
> > there would be nothing to convert.
>
> You are not at all mistaken of course, but many people take "ASCII" to
> mean their favourite eight bit character set (typically Latin 1 or 9 in
> western Europe).
>
> But since the OP reports no change to his files, maybe they really were
> proper ASCII to begin with. Or maybe he is confused about how to make
> emacs use UTF-8 when loading the file? If so, he could do worse than
> read the emacs info file, node "Recognize coding".
>
> --
> * Harald Hanche-Olsen <URL:http://www.math.ntnu.no/~hanche/>
> - It is undesirable to believe a proposition
> when there is no ground whatsoever for supposing it is true.
> -- Bertrand Russell
Well, I have no problem to load UTF-8 file with emacs at all.
The problem is that emacs is not able to write UTF-8 at all.
For example, if a file is encoded in ASCII (or, CP437, or ISO 8859 or
Latin 1 to 9; there are various aliases to indicating of it, but you
already know what it means.), I set it up with M-x set-buffer-file-
coding-system for writing utf-8 encoding. And, write (or save) it.
After that, exit the emacs and re-run it again, and try to read the
saved file to be expected UTF-8 encoding, but it reads again in ASCII.
It does not mean emacs can't read utf-8, but the file itself is not
encoded UTF-8. I check the file's encoding system with other
application like NotePAD++ or other editors, and all say the file is
still ASCII mode even though I write it as utf-8 in emacs.
Again, there is no problem in reading utf-8. When a file is encoded
utf-8 correctly, emacs reads/writes it in utf-8. It's good. However,
emacs is not able to write utf-8 if the file is encoded in ASCII. It
only writes in ASCII encode no matter how I do "set-buffer-file-coding-
system"
So, if somebody knows this issue and how to write utf-8 correctly when
a file is encoded in ISO8859 (or CP437 or ASCII), and if you share the
information, it would be appreciated.
Thanks,
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Convert UTF-8
2008-12-17 8:41 ` YOUNG
@ 2008-12-17 9:59 ` Thierry Volpiatto
2008-12-17 11:17 ` Giorgos Keramidas
2008-12-17 12:04 ` Xah Lee
2 siblings, 0 replies; 10+ messages in thread
From: Thierry Volpiatto @ 2008-12-17 9:59 UTC (permalink / raw)
To: help-gnu-emacs
YOUNG <breadncup@gmail.com> writes:
> On Dec 16, 11:54 pm, Harald Hanche-Olsen <han...@math.ntnu.no> wrote:
>> + Andreas Politz <poli...@fh-trier.de>:
>>
>> > YOUNG wrote:
>>
>> >> I have a Emacs 22.3.1 for Windows XP, and there is a file encoded in
>> >> ASCII. I am trying to read the file and convert it to UTF-8 with
>> >> emacs.
>>
>> > If I am not mistaken, converting a ASCII file to UTF-8 is an identity
>> > operation, since the later is backwards compatible to the former. So
>> > there would be nothing to convert.
>>
>> You are not at all mistaken of course, but many people take "ASCII" to
>> mean their favourite eight bit character set (typically Latin 1 or 9 in
>> western Europe).
>>
>> But since the OP reports no change to his files, maybe they really were
>> proper ASCII to begin with. Or maybe he is confused about how to make
>> emacs use UTF-8 when loading the file? If so, he could do worse than
>> read the emacs info file, node "Recognize coding".
>>
>> --
>> * Harald Hanche-Olsen <URL:http://www.math.ntnu.no/~hanche/>
>> - It is undesirable to believe a proposition
>> when there is no ground whatsoever for supposing it is true.
>> -- Bertrand Russell
>
> Well, I have no problem to load UTF-8 file with emacs at all.
>
> The problem is that emacs is not able to write UTF-8 at all.
>
> For example, if a file is encoded in ASCII (or, CP437, or ISO 8859 or
> Latin 1 to 9; there are various aliases to indicating of it, but you
> already know what it means.), I set it up with M-x set-buffer-file-
> coding-system for writing utf-8 encoding. And, write (or save) it.
> After that, exit the emacs and re-run it again, and try to read the
> saved file to be expected UTF-8 encoding, but it reads again in ASCII.
> It does not mean emacs can't read utf-8, but the file itself is not
> encoded UTF-8. I check the file's encoding system with other
> application like NotePAD++ or other editors, and all say the file is
> still ASCII mode even though I write it as utf-8 in emacs.
>
> Again, there is no problem in reading utf-8. When a file is encoded
> utf-8 correctly, emacs reads/writes it in utf-8. It's good. However,
> emacs is not able to write utf-8 if the file is encoded in ASCII. It
> only writes in ASCII encode no matter how I do "set-buffer-file-coding-
> system"
>
> So, if somebody knows this issue and how to write utf-8 correctly when
> a file is encoded in ISO8859 (or CP437 or ASCII), and if you share the
> information, it would be appreciated.
I was using iso-8859-15 before switching my system to utf-8.
I just add to my files: (not -*- utf-8 encoding -*-)
,----
| # -*- coding: utf-8 -*-
`----
instead of
,----
| # -*- coding: iso-8859-15 -*-
`----
--
A + Thierry Volpiatto
Location: Saint-Cyr-Sur-Mer - France
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Convert UTF-8
2008-12-17 1:05 Convert UTF-8 YOUNG
2008-12-17 2:27 ` Andreas Politz
@ 2008-12-17 10:43 ` Peter Dyballa
1 sibling, 0 replies; 10+ messages in thread
From: Peter Dyballa @ 2008-12-17 10:43 UTC (permalink / raw)
To: YOUNG; +Cc: help-gnu-emacs
Am 17.12.2008 um 02:05 schrieb YOUNG:
> I have a Emacs 22.3.1 for Windows XP, and there is a file encoded in
> ASCII. I am trying to read the file and convert it to UTF-8 with
> emacs.
You could also try:
(prefer-coding-system 'utf-8)
It's a global option. The best to set GNU Emacs' behaviour in this
area is by environment variables like LANG or LC_CTYPE that name some
UTF-8 based encoding. I don't know how this is handled in MS Losedos.
From the Options menu choose Mule and then "Set Coding Systems" from
which "For Next Command" (C-x RET c) will allow you to set a
particular encoding system for reading the file.
--
Greetings
Pete
Bake pizza not war!
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Convert UTF-8
2008-12-17 8:41 ` YOUNG
2008-12-17 9:59 ` Thierry Volpiatto
@ 2008-12-17 11:17 ` Giorgos Keramidas
2008-12-17 12:04 ` Xah Lee
2 siblings, 0 replies; 10+ messages in thread
From: Giorgos Keramidas @ 2008-12-17 11:17 UTC (permalink / raw)
To: help-gnu-emacs
On Wed, 17 Dec 2008 00:41:47 -0800 (PST), YOUNG <breadncup@gmail.com> wrote:
> Well, I have no problem to load UTF-8 file with emacs at all.
>
> The problem is that emacs is not able to write UTF-8 at all.
>
> For example, if a file is encoded in ASCII (or, CP437, or ISO 8859 or
> Latin 1 to 9; there are various aliases to indicating of it, but you
> already know what it means.), I set it up with M-x set-buffer-file-
> coding-system for writing utf-8 encoding. And, write (or save) it.
> After that, exit the emacs and re-run it again, and try to read the
> saved file to be expected UTF-8 encoding, but it reads again in ASCII.
> It does not mean emacs can't read utf-8, but the file itself is not
> encoded UTF-8. I check the file's encoding system with other
> application like NotePAD++ or other editors, and all say the file is
> still ASCII mode even though I write it as utf-8 in emacs.
ASCII contains only 7-bit characters. All the characters of the 7-bit
ASCII character set map to themselves in the UTF-8 coding system.
This means that when a file contains only characters from the ASCII
character set no conversion at all is needed from UTF-8 to ASCII or vice
versa.
If you set the buffer-file-coding system to UTF-8 *and* type some text
that requires at least 8-bits to be represented correctly in in UTF-8,
then the file will be saved in UTF-8.
> Again, there is no problem in reading utf-8. When a file is encoded
> utf-8 correctly, emacs reads/writes it in utf-8. It's good. However,
> emacs is not able to write utf-8 if the file is encoded in ASCII. It
> only writes in ASCII encode no matter how I do
> "set-buffer-file-coding- system"
>
> So, if somebody knows this issue and how to write utf-8 correctly when
> a file is encoded in ISO8859 (or CP437 or ASCII), and if you share the
> information, it would be appreciated.
CP437 is very different from plain ASCII. It contains 8-bit characters
and there are other differences in the 0x00 - 0x1F code range. If you
ignore the 0x00-0x1F character differences you might be able to say that
CP437 is a 'superset' of ASCII, but they are not the same thing.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Convert UTF-8
2008-12-17 8:41 ` YOUNG
2008-12-17 9:59 ` Thierry Volpiatto
2008-12-17 11:17 ` Giorgos Keramidas
@ 2008-12-17 12:04 ` Xah Lee
2008-12-18 8:35 ` YOUNG
2 siblings, 1 reply; 10+ messages in thread
From: Xah Lee @ 2008-12-17 12:04 UTC (permalink / raw)
To: help-gnu-emacs
On Dec 17, 12:41 am, YOUNG <breadn...@gmail.com> wrote:
> Well, I have no problem to load UTF-8 file with emacs at all.
>
> The problem is that emacs is not able to write UTF-8 at all.
>
> For example, if a file is encoded in ASCII (or, CP437, or ISO 8859 or
> Latin 1 to 9; there are various aliases to indicating of it, but you
> already know what it means.), I set it up with M-x set-buffer-file-
> coding-system for writing utf-8 encoding. And, write (or save) it.
> After that, exit the emacs and re-run it again, and try to read the
> saved file to be expected UTF-8 encoding, but it reads again in ASCII.
> It does not mean emacs can't read utf-8, but the file itself is not
> encoded UTF-8. I check the file's encoding system with other
> application like NotePAD++ or other editors, and all say the file is
> still ASCII mode even though I write it as utf-8 in emacs.
>
> Again, there is no problem in reading utf-8. When a file is encoded
> utf-8 correctly, emacs reads/writes it in utf-8. It's good. However,
> emacs is not able to write utf-8 if the file is encoded in ASCII. It
> only writes in ASCII encode no matter how I do "set-buffer-file-coding-
> system"
>
> So, if somebody knows this issue and how to write utf-8 correctly when
> a file is encoded in ISO8859 (or CP437 or ASCII), and if you share the
> information, it would be appreciated.
>
> Thanks,
as other have mentioned, utf-8 is just a super set of ascii, so files
encoded in either are identical.
You mentioned ISO8859, which is not ascii. I read your 2 posts, but
don't quite understand what you wanted.
For some unicode with emacs tips, you might checkout:
• Emacs and Unicode Tips
http://xahlee.org/emacs/emacs_n_unicode.html
You might also beefup understanding of char encoding:
http://en.wikipedia.org/wiki/ISO8859
http://en.wikipedia.org/wiki/ASCII
http://en.wikipedia.org/wiki/UTF-8
Xah
∑ http://xahlee.org/
☄
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Convert UTF-8
2008-12-17 12:04 ` Xah Lee
@ 2008-12-18 8:35 ` YOUNG
2008-12-18 14:56 ` Harald Hanche-Olsen
0 siblings, 1 reply; 10+ messages in thread
From: YOUNG @ 2008-12-18 8:35 UTC (permalink / raw)
To: help-gnu-emacs
On Dec 17, 4:04 am, Xah Lee <xah...@gmail.com> wrote:
> On Dec 17, 12:41 am, YOUNG <breadn...@gmail.com> wrote:
>
>
>
> > Well, I have no problem to load UTF-8 file with emacs at all.
>
> > The problem is that emacs is not able to write UTF-8 at all.
>
> > For example, if a file is encoded in ASCII (or, CP437, or ISO 8859 or
> > Latin 1 to 9; there are various aliases to indicating of it, but you
> > already know what it means.), I set it up with M-x set-buffer-file-
> > coding-system for writing utf-8 encoding. And, write (or save) it.
> > After that, exit the emacs and re-run it again, and try to read the
> > saved file to be expected UTF-8 encoding, but it reads again in ASCII.
> > It does not mean emacs can't read utf-8, but the file itself is not
> > encoded UTF-8. I check the file's encoding system with other
> > application like NotePAD++ or other editors, and all say the file is
> > still ASCII mode even though I write it as utf-8 in emacs.
>
> > Again, there is no problem in reading utf-8. When a file is encoded
> > utf-8 correctly, emacs reads/writes it in utf-8. It's good. However,
> > emacs is not able to write utf-8 if the file is encoded in ASCII. It
> > only writes in ASCII encode no matter how I do "set-buffer-file-coding-
> > system"
>
> > So, if somebody knows this issue and how to write utf-8 correctly when
> > a file is encoded in ISO8859 (or CP437 or ASCII), and if you share the
> > information, it would be appreciated.
>
> > Thanks,
>
> as other have mentioned, utf-8 is just a super set of ascii, so files
> encoded in either are identical.
>
> You mentioned ISO8859, which is not ascii. I read your 2 posts, but
> don't quite understand what you wanted.
>
> For some unicode with emacs tips, you might checkout:
>
> • Emacs and Unicode Tips
> http://xahlee.org/emacs/emacs_n_unicode.html
>
> You might also beefup understanding of char encoding:
>
> http://en.wikipedia.org/wiki/ISO8859http://en.wikipedia.org/wiki/ASCIIhttp://en.wikipedia.org/wiki/UTF-8
>
> Xah
> ∑http://xahlee.org/
>
> ☄
Hi,
Finally, I know what is the problem. Thank you guys for helping this
issues.
I am not expert on encoding system, though, I thank this opportunity
for me to learn it.
The problem is BOM (Byte Order Mark). In case of utf-8, it is avoided
since BOM header could cause conflict when the expected special
character is starting position like '#!' in Unix shell script.
Therefore, if there is no text written at least 8-bits to be
represented in utf-8, the text encoding is not defined or ASCII (I am
not sure if it is right term, but here, let's say it is ASCII for
convenience.) in emacs.
I could conclude emacs does not have the feature of having BOM in
utf-8. It only supports utf-8 without BOM. So, I could understand why
the text was not written in utf-8 if the text does not contain actual
utf-8 characters. If there is a text in utf-8 character and save it as
utf-8, then there is no problem in writing utf-8 without BOM.
Detailed information about unicode and BOM is found in
http://unicode.org/faq/utf_bom.html
Thank you,
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Convert UTF-8
2008-12-18 8:35 ` YOUNG
@ 2008-12-18 14:56 ` Harald Hanche-Olsen
0 siblings, 0 replies; 10+ messages in thread
From: Harald Hanche-Olsen @ 2008-12-18 14:56 UTC (permalink / raw)
To: help-gnu-emacs
+ YOUNG <breadncup@gmail.com>:
> I could conclude emacs does not have the feature of having BOM in
> utf-8. It only supports utf-8 without BOM.
Not true. But you have to put the BOM (ZERO WIDTH NO-BREAK SPACE,
really) there yourself, since otherwise as you noted (in the elided
text) it can play havoc with shell scripts etc. If you want, e.g., every
file that is visited in text mode to start with a BOM you can add a hook
function to before-save-hook that ensures this before saving.
Also, at least the emacsen I am currently using (version 23 from CVS)
will recognize an initial BOM and automagically pick the utf-8 encoding
when it sees the corresponding three bytes at the top of the file.
> Detailed information about unicode and BOM is found in
> http://unicode.org/faq/utf_bom.html
The use of zero width no-break space as a marker to indicate coding is
also widely regarded as unwise. I am too lazy to find any of the
references that will support my claim, so take it with a grain of salt
if you will.
--
* Harald Hanche-Olsen <URL:http://www.math.ntnu.no/~hanche/>
- It is undesirable to believe a proposition
when there is no ground whatsoever for supposing it is true.
-- Bertrand Russell
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2008-12-18 14:56 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-17 1:05 Convert UTF-8 YOUNG
2008-12-17 2:27 ` Andreas Politz
2008-12-17 7:54 ` Harald Hanche-Olsen
2008-12-17 8:41 ` YOUNG
2008-12-17 9:59 ` Thierry Volpiatto
2008-12-17 11:17 ` Giorgos Keramidas
2008-12-17 12:04 ` Xah Lee
2008-12-18 8:35 ` YOUNG
2008-12-18 14:56 ` Harald Hanche-Olsen
2008-12-17 10:43 ` Peter Dyballa
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).