Detecting BOM for UTF files

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Detecting BOM for UTF files
@ 2003-10-15 13:55 Sébastien Kirche
  2003-10-15 14:23 ` Andreas Schwab
  0 siblings, 1 reply; 5+ messages in thread
From: Sébastien Kirche @ 2003-10-15 13:55 UTC (permalink / raw)


Hi,
I recently had to switch my .emacs to UTF-8 encoding, because I was 
annoyed with proper encoding of french acccents between my home Emacs 
(win + linux) and my work Emacs (Mac OS X).

I saw then that Emacs cannot recognize automagically the UTF format of 
my .emacs when reloading it.
I seen that there is neither BOM (byte order mark) at the beginning, 
nor adding one helps.

It was also fixed by adding

; -*- mode: emacs-lisp; coding: utf-8; -*-

at the begining.

Emacs seems to handle properly UTF files, but couldn't he "guess" their 
encoding by reading BOMs ?

Sébastien Kirche

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Detecting BOM for UTF files
  2003-10-15 13:55 Detecting BOM for UTF files Sébastien Kirche
@ 2003-10-15 14:23 ` Andreas Schwab
  2003-10-15 14:58   ` Sébastien Kirche
  0 siblings, 1 reply; 5+ messages in thread
From: Andreas Schwab @ 2003-10-15 14:23 UTC (permalink / raw)
  Cc: emacs-devel

Sébastien Kirche <sebastien.kirche@sage.com> writes:

> Hi,
> I recently had to switch my .emacs to UTF-8 encoding, because I was
> annoyed with proper encoding of french acccents between my home Emacs (win
> + linux) and my work Emacs (Mac OS X).
>
> I saw then that Emacs cannot recognize automagically the UTF format of my
> .emacs when reloading it.
> I seen that there is neither BOM (byte order mark) at the beginning, nor
> adding one helps.

??? UTF-8 does not need a BOM, it's an 8-bit encoding.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Detecting BOM for UTF files
  2003-10-15 14:23 ` Andreas Schwab
@ 2003-10-15 14:58   ` Sébastien Kirche
  2003-10-15 15:08     ` Andreas Schwab
  2003-10-15 15:20     ` Benjamin Riefenstahl
  0 siblings, 2 replies; 5+ messages in thread
From: Sébastien Kirche @ 2003-10-15 14:58 UTC (permalink / raw)
  Cc: emacs-devel


Le mercredi, 15 oct 2003, à 16:23 Europe/Paris, Andreas Schwab a écrit :

> ??? UTF-8 does not need a BOM, it's an 8-bit encoding.

Mmmh, FAQ of Unicode.org explains that there exists one BOM for UTF-8 
too :

look here : http://www.unicode.org/faq/utf_bom.html#25

Anyway, do I have to understand that it isn't poosible to guess type 
without BOM ?

Sébastien Kirche

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Detecting BOM for UTF files
  2003-10-15 14:58   ` Sébastien Kirche
@ 2003-10-15 15:08     ` Andreas Schwab
  2003-10-15 15:20     ` Benjamin Riefenstahl
  1 sibling, 0 replies; 5+ messages in thread
From: Andreas Schwab @ 2003-10-15 15:08 UTC (permalink / raw)
  Cc: emacs-devel

Sébastien Kirche <sebastien.kirche@sage.com> writes:

> Anyway, do I have to understand that it isn't poosible to guess type
> without BOM ?

Such a BOM is just as useful as anything, since it is composed of valid
ISO-8859-1 bytes, for example.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Detecting BOM for UTF files
  2003-10-15 14:58   ` Sébastien Kirche
  2003-10-15 15:08     ` Andreas Schwab
@ 2003-10-15 15:20     ` Benjamin Riefenstahl
  1 sibling, 0 replies; 5+ messages in thread
From: Benjamin Riefenstahl @ 2003-10-15 15:20 UTC (permalink / raw)
  Cc: emacs-devel

Hi Sébastien,

> Andreas Schwab a écrit :
>
>> ??? UTF-8 does not need a BOM, it's an 8-bit encoding.

Sébastien Kirche <sebastien.kirche@sage.com> writes:
> Mmmh, FAQ of Unicode.org explains that there exists one BOM for
> UTF-8 too :

There is no contradiction.  You *can* add a BOM to UTF-8 text, it just
doesn't serve a purpose, because UTF-8 doesn't have a "byte order" to
mark.  In addition, the BOM makes text manipulation more difficult, so
it is actually not recommended by the Unicode standard.

> Anyway, do I have to understand that it isn't poosible to guess type
> without BOM ?

No.  Current versions of Emacs can handle UTF-8 automatically without
additional packages.  You just have to configure it right.  Have a
look at the function prefer-coding-system.

If you have further questions, you should probably post them to an
Emacs user list or newsgroup like comp.emacs.

benny

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2003-10-15 15:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-10-15 13:55 Detecting BOM for UTF files Sébastien Kirche
2003-10-15 14:23 ` Andreas Schwab
2003-10-15 14:58   ` Sébastien Kirche
2003-10-15 15:08     ` Andreas Schwab
2003-10-15 15:20     ` Benjamin Riefenstahl

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).