all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Convert UTF-8
@ 2008-12-17  1:05 YOUNG
  2008-12-17  2:27 ` Andreas Politz
  2008-12-17 10:43 ` Peter Dyballa
  0 siblings, 2 replies; 10+ messages in thread
From: YOUNG @ 2008-12-17  1:05 UTC (permalink / raw)
  To: help-gnu-emacs

Hi,

I have a Emacs 22.3.1 for Windows XP, and there is a file encoded in
ASCII. I am trying to read the file and convert it to UTF-8 with
emacs.

I have tried

M-x set-buffer-file-coding-system

and set up utf-8 and check it has changed to 'u' in status bar, and
since buffer has changed, it shows '**' as well.

So, I write the file using "C-x s".

It seems to fine. So, I exit the emacs, and rerun the emacs again and
read the file, too. However, the file is not converted at all.

Here is when I did "describe-current-coding-system"

----------------------

Coding system for saving this buffer:
  - -- undecided-dos

Default coding system (for new files):
  u -- mule-utf-8 (alias: utf-8)

Coding system for keyboard input:
  * -- cp1252 (alias of windows-1252)

Coding system for terminal output:
  * -- cp1252 (alias of windows-1252)

Defaults for subprocess I/O:
  decoding: u -- mule-utf-8-dos

  encoding: u -- mule-utf-8-unix


Priority order for recognizing coding systems when reading files:
  1. mule-utf-8 (alias: utf-8)
  2. iso-latin-1 (alias: iso-8859-1 latin-1)
  3. mule-utf-16be-with-signature (alias: utf-16be-with-signature mule-
utf-16-be utf-16-be)
  4. mule-utf-16le-with-signature (alias: utf-16le-with-signature mule-
utf-16-le utf-16-le)
  5. iso-2022-jp (alias: junet)
  6. iso-2022-7bit
  7. iso-2022-7bit-lock (alias: iso-2022-int-1)
  8. iso-2022-8bit-ss2
  9. emacs-mule
  10. raw-text
  11. japanese-shift-jis (alias: shift_jis sjis cp932)
  12. chinese-big5 (alias: big5 cn-big5 cp950)
  13. no-conversion

  Other coding systems cannot be distinguished automatically
  from these, and therefore cannot be recognized automatically
  with the present coding system priorities.

  The following are decoded correctly but recognized as iso-2022-7bit-
lock:
    iso-2022-7bit-ss2 iso-2022-7bit-lock-ss2 iso-2022-cn iso-2022-cn-
ext iso-2022-jp-2 iso-2022-kr

Particular coding systems specified for certain file names:

  OPERATION	TARGET PATTERN		CODING SYSTEM(s)
  ---------	--------------		----------------
  File I/O	"\\.dz\\'"		(no-conversion . no-conversion)
		"\\.g?z\\(~\\|\\.~[0-9]+~\\)?\\'"
					(no-conversion . no-conversion)
		"\\.tgz\\'"		(no-conversion . no-conversion)
		"\\.tbz\\'"		(no-conversion . no-conversion)
		"\\.bz2\\(~\\|\\.~[0-9]+~\\)?\\'"
					(no-conversion . no-conversion)
		"\\.Z\\(~\\|\\.~[0-9]+~\\)?\\'"
					(no-conversion . no-conversion)
		"\\.elc\\'"		(emacs-mule . emacs-mule)
		"\\.utf\\(-8\\)?\\'"	utf-8
		"\\(\\`\\|/\\)loaddefs.el\\'"
					(raw-text . raw-text-unix)
		"\\.tar\\'"		(no-conversion . no-conversion)
		"\\.po[tx]?\\'\\|\\.po\\."
					po-find-file-coding-system
		"\\.\\(tex\\|ltx\\|dtx\\|drv\\)\\'"
					latexenc-find-file-coding-system
		""			find-buffer-file-type-coding-system
  Process I/O	"[pP][lL][iI][nN][kK]"	(undecided-dos . undecided-dos)
		"[cC][mM][dD][pP][rR][oO][xX][yY]"
					(undecided-dos . undecided-dos)
  Network I/O	nothing specified
----------------------

Do you know how to convert a file to UTF-8 using emacs, please?



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Convert UTF-8
  2008-12-17  1:05 Convert UTF-8 YOUNG
@ 2008-12-17  2:27 ` Andreas Politz
  2008-12-17  7:54   ` Harald Hanche-Olsen
  2008-12-17 10:43 ` Peter Dyballa
  1 sibling, 1 reply; 10+ messages in thread
From: Andreas Politz @ 2008-12-17  2:27 UTC (permalink / raw)
  To: help-gnu-emacs

YOUNG wrote:
> Hi,
> 
> I have a Emacs 22.3.1 for Windows XP, and there is a file encoded in
> ASCII. I am trying to read the file and convert it to UTF-8 with
> emacs.
> 

If I am not mistaken, converting a ASCII file to UTF-8 is an identity operation,
since the later is backwards compatible to the former. So there would be nothing
to convert.

-ap

> I have tried
> 
> M-x set-buffer-file-coding-system
> 
> and set up utf-8 and check it has changed to 'u' in status bar, and
> since buffer has changed, it shows '**' as well.
> 
> So, I write the file using "C-x s".
> 
> It seems to fine. So, I exit the emacs, and rerun the emacs again and
> read the file, too. However, the file is not converted at all.
> 
> Here is when I did "describe-current-coding-system"
> 
> ----------------------
> 
> Coding system for saving this buffer:
>   - -- undecided-dos
> 
> Default coding system (for new files):
>   u -- mule-utf-8 (alias: utf-8)
> 
> Coding system for keyboard input:
>   * -- cp1252 (alias of windows-1252)
> 
> Coding system for terminal output:
>   * -- cp1252 (alias of windows-1252)
> 
> Defaults for subprocess I/O:
>   decoding: u -- mule-utf-8-dos
> 
>   encoding: u -- mule-utf-8-unix
> 
> 
> Priority order for recognizing coding systems when reading files:
>   1. mule-utf-8 (alias: utf-8)
>   2. iso-latin-1 (alias: iso-8859-1 latin-1)
>   3. mule-utf-16be-with-signature (alias: utf-16be-with-signature mule-
> utf-16-be utf-16-be)
>   4. mule-utf-16le-with-signature (alias: utf-16le-with-signature mule-
> utf-16-le utf-16-le)
>   5. iso-2022-jp (alias: junet)
>   6. iso-2022-7bit
>   7. iso-2022-7bit-lock (alias: iso-2022-int-1)
>   8. iso-2022-8bit-ss2
>   9. emacs-mule
>   10. raw-text
>   11. japanese-shift-jis (alias: shift_jis sjis cp932)
>   12. chinese-big5 (alias: big5 cn-big5 cp950)
>   13. no-conversion
> 
>   Other coding systems cannot be distinguished automatically
>   from these, and therefore cannot be recognized automatically
>   with the present coding system priorities.
> 
>   The following are decoded correctly but recognized as iso-2022-7bit-
> lock:
>     iso-2022-7bit-ss2 iso-2022-7bit-lock-ss2 iso-2022-cn iso-2022-cn-
> ext iso-2022-jp-2 iso-2022-kr
> 
> Particular coding systems specified for certain file names:
> 
>   OPERATION	TARGET PATTERN		CODING SYSTEM(s)
>   ---------	--------------		----------------
>   File I/O	"\\.dz\\'"		(no-conversion . no-conversion)
> 		"\\.g?z\\(~\\|\\.~[0-9]+~\\)?\\'"
> 					(no-conversion . no-conversion)
> 		"\\.tgz\\'"		(no-conversion . no-conversion)
> 		"\\.tbz\\'"		(no-conversion . no-conversion)
> 		"\\.bz2\\(~\\|\\.~[0-9]+~\\)?\\'"
> 					(no-conversion . no-conversion)
> 		"\\.Z\\(~\\|\\.~[0-9]+~\\)?\\'"
> 					(no-conversion . no-conversion)
> 		"\\.elc\\'"		(emacs-mule . emacs-mule)
> 		"\\.utf\\(-8\\)?\\'"	utf-8
> 		"\\(\\`\\|/\\)loaddefs.el\\'"
> 					(raw-text . raw-text-unix)
> 		"\\.tar\\'"		(no-conversion . no-conversion)
> 		"\\.po[tx]?\\'\\|\\.po\\."
> 					po-find-file-coding-system
> 		"\\.\\(tex\\|ltx\\|dtx\\|drv\\)\\'"
> 					latexenc-find-file-coding-system
> 		""			find-buffer-file-type-coding-system
>   Process I/O	"[pP][lL][iI][nN][kK]"	(undecided-dos . undecided-dos)
> 		"[cC][mM][dD][pP][rR][oO][xX][yY]"
> 					(undecided-dos . undecided-dos)
>   Network I/O	nothing specified
> ----------------------
> 
> Do you know how to convert a file to UTF-8 using emacs, please?
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Convert UTF-8
  2008-12-17  2:27 ` Andreas Politz
@ 2008-12-17  7:54   ` Harald Hanche-Olsen
  2008-12-17  8:41     ` YOUNG
  0 siblings, 1 reply; 10+ messages in thread
From: Harald Hanche-Olsen @ 2008-12-17  7:54 UTC (permalink / raw)
  To: help-gnu-emacs

+ Andreas Politz <politza@fh-trier.de>:

> YOUNG wrote:
>>
>> I have a Emacs 22.3.1 for Windows XP, and there is a file encoded in
>> ASCII. I am trying to read the file and convert it to UTF-8 with
>> emacs.
>>
>
> If I am not mistaken, converting a ASCII file to UTF-8 is an identity
> operation, since the later is backwards compatible to the former. So
> there would be nothing to convert.

You are not at all mistaken of course, but many people take "ASCII" to
mean their favourite eight bit character set (typically Latin 1 or 9 in
western Europe).

But since the OP reports no change to his files, maybe they really were
proper ASCII to begin with. Or maybe he is confused about how to make
emacs use UTF-8 when loading the file? If so, he could do worse than
read the emacs info file, node "Recognize coding".

-- 
* Harald Hanche-Olsen     <URL:http://www.math.ntnu.no/~hanche/>
- It is undesirable to believe a proposition
  when there is no ground whatsoever for supposing it is true.
  -- Bertrand Russell


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Convert UTF-8
  2008-12-17  7:54   ` Harald Hanche-Olsen
@ 2008-12-17  8:41     ` YOUNG
  2008-12-17  9:59       ` Thierry Volpiatto
                         ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: YOUNG @ 2008-12-17  8:41 UTC (permalink / raw)
  To: help-gnu-emacs

On Dec 16, 11:54 pm, Harald Hanche-Olsen <han...@math.ntnu.no> wrote:
> + Andreas Politz <poli...@fh-trier.de>:
>
> > YOUNG wrote:
>
> >> I have a Emacs 22.3.1 for Windows XP, and there is a file encoded in
> >> ASCII. I am trying to read the file and convert it to UTF-8 with
> >> emacs.
>
> > If I am not mistaken, converting a ASCII file to UTF-8 is an identity
> > operation, since the later is backwards compatible to the former. So
> > there would be nothing to convert.
>
> You are not at all mistaken of course, but many people take "ASCII" to
> mean their favourite eight bit character set (typically Latin 1 or 9 in
> western Europe).
>
> But since the OP reports no change to his files, maybe they really were
> proper ASCII to begin with. Or maybe he is confused about how to make
> emacs use UTF-8 when loading the file? If so, he could do worse than
> read the emacs info file, node "Recognize coding".
>
> --
> * Harald Hanche-Olsen     <URL:http://www.math.ntnu.no/~hanche/>
> - It is undesirable to believe a proposition
>   when there is no ground whatsoever for supposing it is true.
>   -- Bertrand Russell

Well, I have no problem to load UTF-8 file with emacs at all.

The problem is that emacs is not able to write UTF-8 at all.

For example, if a file is encoded in ASCII (or, CP437, or ISO 8859 or
Latin 1 to 9; there are various aliases to indicating of it, but you
already know what it means.), I set it up with M-x set-buffer-file-
coding-system for writing utf-8 encoding. And, write (or save) it.
After that, exit the emacs and re-run it again, and try to read the
saved file to be expected UTF-8 encoding, but it reads again in ASCII.
It does not mean emacs can't read utf-8, but the file itself is not
encoded UTF-8. I check the file's encoding system with other
application like NotePAD++ or other editors, and all say the file is
still ASCII mode even though I write it as utf-8 in emacs.

Again, there is no problem in reading utf-8. When a file is encoded
utf-8 correctly, emacs reads/writes it in utf-8. It's good. However,
emacs is not able to write utf-8 if the file is encoded in ASCII. It
only writes in ASCII encode no matter how I do "set-buffer-file-coding-
system"

So, if somebody knows this issue and how to write utf-8 correctly when
a file is encoded in ISO8859 (or CP437 or ASCII), and if you share the
information, it would be appreciated.

Thanks,


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Convert UTF-8
  2008-12-17  8:41     ` YOUNG
@ 2008-12-17  9:59       ` Thierry Volpiatto
  2008-12-17 11:17       ` Giorgos Keramidas
  2008-12-17 12:04       ` Xah Lee
  2 siblings, 0 replies; 10+ messages in thread
From: Thierry Volpiatto @ 2008-12-17  9:59 UTC (permalink / raw)
  To: help-gnu-emacs

YOUNG <breadncup@gmail.com> writes:

> On Dec 16, 11:54 pm, Harald Hanche-Olsen <han...@math.ntnu.no> wrote:
>> + Andreas Politz <poli...@fh-trier.de>:
>>
>> > YOUNG wrote:
>>
>> >> I have a Emacs 22.3.1 for Windows XP, and there is a file encoded in
>> >> ASCII. I am trying to read the file and convert it to UTF-8 with
>> >> emacs.
>>
>> > If I am not mistaken, converting a ASCII file to UTF-8 is an identity
>> > operation, since the later is backwards compatible to the former. So
>> > there would be nothing to convert.
>>
>> You are not at all mistaken of course, but many people take "ASCII" to
>> mean their favourite eight bit character set (typically Latin 1 or 9 in
>> western Europe).
>>
>> But since the OP reports no change to his files, maybe they really were
>> proper ASCII to begin with. Or maybe he is confused about how to make
>> emacs use UTF-8 when loading the file? If so, he could do worse than
>> read the emacs info file, node "Recognize coding".
>>
>> --
>> * Harald Hanche-Olsen     <URL:http://www.math.ntnu.no/~hanche/>
>> - It is undesirable to believe a proposition
>>   when there is no ground whatsoever for supposing it is true.
>>   -- Bertrand Russell
>
> Well, I have no problem to load UTF-8 file with emacs at all.
>
> The problem is that emacs is not able to write UTF-8 at all.
>
> For example, if a file is encoded in ASCII (or, CP437, or ISO 8859 or
> Latin 1 to 9; there are various aliases to indicating of it, but you
> already know what it means.), I set it up with M-x set-buffer-file-
> coding-system for writing utf-8 encoding. And, write (or save) it.
> After that, exit the emacs and re-run it again, and try to read the
> saved file to be expected UTF-8 encoding, but it reads again in ASCII.
> It does not mean emacs can't read utf-8, but the file itself is not
> encoded UTF-8. I check the file's encoding system with other
> application like NotePAD++ or other editors, and all say the file is
> still ASCII mode even though I write it as utf-8 in emacs.
>
> Again, there is no problem in reading utf-8. When a file is encoded
> utf-8 correctly, emacs reads/writes it in utf-8. It's good. However,
> emacs is not able to write utf-8 if the file is encoded in ASCII. It
> only writes in ASCII encode no matter how I do "set-buffer-file-coding-
> system"
>
> So, if somebody knows this issue and how to write utf-8 correctly when
> a file is encoded in ISO8859 (or CP437 or ASCII), and if you share the
> information, it would be appreciated.

I was using iso-8859-15 before switching my system to utf-8.
I just add to my files: (not -*- utf-8 encoding -*-)

,----
| # -*- coding: utf-8 -*-
`----

instead of 

,----
| # -*- coding: iso-8859-15 -*-
`----

-- 
A + Thierry Volpiatto
Location: Saint-Cyr-Sur-Mer - France





^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Convert UTF-8
  2008-12-17  1:05 Convert UTF-8 YOUNG
  2008-12-17  2:27 ` Andreas Politz
@ 2008-12-17 10:43 ` Peter Dyballa
  1 sibling, 0 replies; 10+ messages in thread
From: Peter Dyballa @ 2008-12-17 10:43 UTC (permalink / raw)
  To: YOUNG; +Cc: help-gnu-emacs


Am 17.12.2008 um 02:05 schrieb YOUNG:

> I have a Emacs 22.3.1 for Windows XP, and there is a file encoded in
> ASCII. I am trying to read the file and convert it to UTF-8 with
> emacs.

You could also try:

	(prefer-coding-system	'utf-8)

It's a global option. The best to set GNU Emacs' behaviour in this  
area is by environment variables like LANG or LC_CTYPE that name some  
UTF-8 based encoding. I don't know how this is handled in MS Losedos.

 From the Options menu choose Mule and then "Set Coding Systems" from  
which "For Next Command" (C-x RET c) will allow you to set a  
particular encoding system for reading the file.

--
Greetings

   Pete

Bake pizza not war!







^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Convert UTF-8
  2008-12-17  8:41     ` YOUNG
  2008-12-17  9:59       ` Thierry Volpiatto
@ 2008-12-17 11:17       ` Giorgos Keramidas
  2008-12-17 12:04       ` Xah Lee
  2 siblings, 0 replies; 10+ messages in thread
From: Giorgos Keramidas @ 2008-12-17 11:17 UTC (permalink / raw)
  To: help-gnu-emacs

On Wed, 17 Dec 2008 00:41:47 -0800 (PST), YOUNG <breadncup@gmail.com> wrote:
> Well, I have no problem to load UTF-8 file with emacs at all.
>
> The problem is that emacs is not able to write UTF-8 at all.
>
> For example, if a file is encoded in ASCII (or, CP437, or ISO 8859 or
> Latin 1 to 9; there are various aliases to indicating of it, but you
> already know what it means.), I set it up with M-x set-buffer-file-
> coding-system for writing utf-8 encoding. And, write (or save) it.
> After that, exit the emacs and re-run it again, and try to read the
> saved file to be expected UTF-8 encoding, but it reads again in ASCII.
> It does not mean emacs can't read utf-8, but the file itself is not
> encoded UTF-8. I check the file's encoding system with other
> application like NotePAD++ or other editors, and all say the file is
> still ASCII mode even though I write it as utf-8 in emacs.

ASCII contains only 7-bit characters.  All the characters of the 7-bit
ASCII character set map to themselves in the UTF-8 coding system.

This means that when a file contains only characters from the ASCII
character set no conversion at all is needed from UTF-8 to ASCII or vice
versa.

If you set the buffer-file-coding system to UTF-8 *and* type some text
that requires at least 8-bits to be represented correctly in in UTF-8,
then the file will be saved in UTF-8.

> Again, there is no problem in reading utf-8. When a file is encoded
> utf-8 correctly, emacs reads/writes it in utf-8. It's good. However,
> emacs is not able to write utf-8 if the file is encoded in ASCII. It
> only writes in ASCII encode no matter how I do
> "set-buffer-file-coding- system"
>
> So, if somebody knows this issue and how to write utf-8 correctly when
> a file is encoded in ISO8859 (or CP437 or ASCII), and if you share the
> information, it would be appreciated.

CP437 is very different from plain ASCII.  It contains 8-bit characters
and there are other differences in the 0x00 - 0x1F code range.  If you
ignore the 0x00-0x1F character differences you might be able to say that
CP437 is a 'superset' of ASCII, but they are not the same thing.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Convert UTF-8
  2008-12-17  8:41     ` YOUNG
  2008-12-17  9:59       ` Thierry Volpiatto
  2008-12-17 11:17       ` Giorgos Keramidas
@ 2008-12-17 12:04       ` Xah Lee
  2008-12-18  8:35         ` YOUNG
  2 siblings, 1 reply; 10+ messages in thread
From: Xah Lee @ 2008-12-17 12:04 UTC (permalink / raw)
  To: help-gnu-emacs

On Dec 17, 12:41 am, YOUNG <breadn...@gmail.com> wrote:
> Well, I have no problem to load UTF-8 file with emacs at all.
>
> The problem is that emacs is not able to write UTF-8 at all.
>
> For example, if a file is encoded in ASCII (or, CP437, or ISO 8859 or
> Latin 1 to 9; there are various aliases to indicating of it, but you
> already know what it means.), I set it up with M-x set-buffer-file-
> coding-system for writing utf-8 encoding. And, write (or save) it.
> After that, exit the emacs and re-run it again, and try to read the
> saved file to be expected UTF-8 encoding, but it reads again in ASCII.
> It does not mean emacs can't read utf-8, but the file itself is not
> encoded UTF-8. I check the file's encoding system with other
> application like NotePAD++ or other editors, and all say the file is
> still ASCII mode even though I write it as utf-8 in emacs.
>
> Again, there is no problem in reading utf-8. When a file is encoded
> utf-8 correctly, emacs reads/writes it in utf-8. It's good. However,
> emacs is not able to write utf-8 if the file is encoded in ASCII. It
> only writes in ASCII encode no matter how I do "set-buffer-file-coding-
> system"
>
> So, if somebody knows this issue and how to write utf-8 correctly when
> a file is encoded in ISO8859 (or CP437 or ASCII), and if you share the
> information, it would be appreciated.
>
> Thanks,

as other have mentioned, utf-8 is just a super set of ascii, so files
encoded in either are identical.

You mentioned ISO8859, which is not ascii. I read your 2 posts, but
don't quite understand what you wanted.

For some unicode with emacs tips, you might checkout:

• Emacs and Unicode Tips
  http://xahlee.org/emacs/emacs_n_unicode.html

You might also beefup understanding of char encoding:

http://en.wikipedia.org/wiki/ISO8859
http://en.wikipedia.org/wiki/ASCII
http://en.wikipedia.org/wiki/UTF-8

  Xah
∑ http://xahlee.org/^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Convert UTF-8
  2008-12-17 12:04       ` Xah Lee
@ 2008-12-18  8:35         ` YOUNG
  2008-12-18 14:56           ` Harald Hanche-Olsen
  0 siblings, 1 reply; 10+ messages in thread
From: YOUNG @ 2008-12-18  8:35 UTC (permalink / raw)
  To: help-gnu-emacs

On Dec 17, 4:04 am, Xah Lee <xah...@gmail.com> wrote:
> On Dec 17, 12:41 am, YOUNG <breadn...@gmail.com> wrote:
>
>
>
> > Well, I have no problem to load UTF-8 file with emacs at all.
>
> > The problem is that emacs is not able to write UTF-8 at all.
>
> > For example, if a file is encoded in ASCII (or, CP437, or ISO 8859 or
> > Latin 1 to 9; there are various aliases to indicating of it, but you
> > already know what it means.), I set it up with M-x set-buffer-file-
> > coding-system for writing utf-8 encoding. And, write (or save) it.
> > After that, exit the emacs and re-run it again, and try to read the
> > saved file to be expected UTF-8 encoding, but it reads again in ASCII.
> > It does not mean emacs can't read utf-8, but the file itself is not
> > encoded UTF-8. I check the file's encoding system with other
> > application like NotePAD++ or other editors, and all say the file is
> > still ASCII mode even though I write it as utf-8 in emacs.
>
> > Again, there is no problem in reading utf-8. When a file is encoded
> > utf-8 correctly, emacs reads/writes it in utf-8. It's good. However,
> > emacs is not able to write utf-8 if the file is encoded in ASCII. It
> > only writes in ASCII encode no matter how I do "set-buffer-file-coding-
> > system"
>
> > So, if somebody knows this issue and how to write utf-8 correctly when
> > a file is encoded in ISO8859 (or CP437 or ASCII), and if you share the
> > information, it would be appreciated.
>
> > Thanks,
>
> as other have mentioned, utf-8 is just a super set of ascii, so files
> encoded in either are identical.
>
> You mentioned ISO8859, which is not ascii. I read your 2 posts, but
> don't quite understand what you wanted.
>
> For some unicode with emacs tips, you might checkout:
>
> • Emacs and Unicode Tips
>  http://xahlee.org/emacs/emacs_n_unicode.html
>
> You might also beefup understanding of char encoding:
>
> http://en.wikipedia.org/wiki/ISO8859http://en.wikipedia.org/wiki/ASCIIhttp://en.wikipedia.org/wiki/UTF-8
>
>   Xah
> ∑http://xahlee.org/
>
> ☄

Hi,

Finally, I know what is the problem. Thank you guys for helping this
issues.

I am not expert on encoding system, though, I thank this opportunity
for me to learn it.

The problem is BOM (Byte Order Mark). In case of utf-8, it is avoided
since BOM header could cause conflict when the expected special
character is starting position like '#!' in Unix shell script.
Therefore, if there is no text written at least 8-bits to be
represented in utf-8, the text encoding is not defined or ASCII (I am
not sure if it is right term, but here, let's say it is ASCII for
convenience.) in emacs.

I could conclude emacs does not have the feature of having BOM in
utf-8. It only supports utf-8 without BOM. So, I could understand why
the text was not written in utf-8 if the text does not contain actual
utf-8 characters. If there is a text in utf-8 character and save it as
utf-8, then there is no problem in writing utf-8 without BOM.

Detailed information about unicode and BOM is found in
http://unicode.org/faq/utf_bom.html

Thank you,



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Convert UTF-8
  2008-12-18  8:35         ` YOUNG
@ 2008-12-18 14:56           ` Harald Hanche-Olsen
  0 siblings, 0 replies; 10+ messages in thread
From: Harald Hanche-Olsen @ 2008-12-18 14:56 UTC (permalink / raw)
  To: help-gnu-emacs

+ YOUNG <breadncup@gmail.com>:

> I could conclude emacs does not have the feature of having BOM in
> utf-8. It only supports utf-8 without BOM.

Not true. But you have to put the BOM (ZERO WIDTH NO-BREAK SPACE,
really) there yourself, since otherwise as you noted (in the elided
text) it can play havoc with shell scripts etc. If you want, e.g., every
file that is visited in text mode to start with a BOM you can add a hook
function to before-save-hook that ensures this before saving.

Also, at least the emacsen I am currently using (version 23 from CVS)
will recognize an initial BOM and automagically pick the utf-8 encoding
when it sees the corresponding three bytes at the top of the file.

> Detailed information about unicode and BOM is found in
> http://unicode.org/faq/utf_bom.html

The use of zero width no-break space as a marker to indicate coding is
also widely regarded as unwise. I am too lazy to find any of the
references that will support my claim, so take it with a grain of salt
if you will.

-- 
* Harald Hanche-Olsen     <URL:http://www.math.ntnu.no/~hanche/>
- It is undesirable to believe a proposition
  when there is no ground whatsoever for supposing it is true.
  -- Bertrand Russell


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-12-18 14:56 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-17  1:05 Convert UTF-8 YOUNG
2008-12-17  2:27 ` Andreas Politz
2008-12-17  7:54   ` Harald Hanche-Olsen
2008-12-17  8:41     ` YOUNG
2008-12-17  9:59       ` Thierry Volpiatto
2008-12-17 11:17       ` Giorgos Keramidas
2008-12-17 12:04       ` Xah Lee
2008-12-18  8:35         ` YOUNG
2008-12-18 14:56           ` Harald Hanche-Olsen
2008-12-17 10:43 ` Peter Dyballa

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.