unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* How to keep character encoding in text file...
@ 2009-03-05 11:17 Marko Myllymaki
  2009-03-05 13:33 ` Pascal J. Bourguignon
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Marko Myllymaki @ 2009-03-05 11:17 UTC (permalink / raw)
  To: help-gnu-emacs

I usually work with UTF-8 encoded files and my system is set up for 
UTF-8 (text may be in english, finnish, swedish, german, russian...)

However, sometimes I edit latin1 -encoded files and I would like to have 
the encoding automatically kept that way when saving files. Now it seems 
to change unwantedly to UTF-8 sometimes... I have to use iconv command 
line tool to change this...

So... if I load UTF-8 encoded file, emacs always saves it that way. If I 
open latin1-encoded file, it should keep it in the original encoding.

Maybe something to put in .emacs to achieve this...


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to keep character encoding in text file...
  2009-03-05 11:17 How to keep character encoding in text file Marko Myllymaki
@ 2009-03-05 13:33 ` Pascal J. Bourguignon
  2009-03-06  7:54   ` Marko Myllymaki
  2009-03-05 18:50 ` Eli Zaretskii
       [not found] ` <mailman.2443.1236279043.31690.help-gnu-emacs@gnu.org>
  2 siblings, 1 reply; 12+ messages in thread
From: Pascal J. Bourguignon @ 2009-03-05 13:33 UTC (permalink / raw)
  To: help-gnu-emacs

Marko Myllymaki <firstname.lastname@iki.fi> writes:

> I usually work with UTF-8 encoded files and my system is set up for
> UTF-8 (text may be in english, finnish, swedish, german, russian...)
>
> However, sometimes I edit latin1 -encoded files and I would like to
> have the encoding automatically kept that way when saving files. Now
> it seems to change unwantedly to UTF-8 sometimes... I have to use
> iconv command line tool to change this...
>
> So... if I load UTF-8 encoded file, emacs always saves it that way. If
> I open latin1-encoded file, it should keep it in the original
> encoding.

I don't know if there's an easy way to do that.


> Maybe something to put in .emacs to achieve this...

The easiest way is to put a comment on the first two lines containing:

-*- coding:iso-8859-1 -*-


Alternatively, you could put a comment in the last 512 bytes of the file containing:

Local Variables:
codingl iso-8859-1
End:



Otherwise, you may use the file extension, or the file location  (or
really any regexp on the file path) to determine the encoding,
customizing the file-coding-system-alist variable.

(push '("^/some/dir/latin-1-files/.*" iso-8859-1 . iso-8859-1)
       file-coding-system-alist)

(push '("\\.iso-8859-1$" iso-8859-1 . iso-8859-1)
       file-coding-system-alist)

-- 
__Pascal Bourguignon__


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to keep character encoding in text file...
  2009-03-05 11:17 How to keep character encoding in text file Marko Myllymaki
  2009-03-05 13:33 ` Pascal J. Bourguignon
@ 2009-03-05 18:50 ` Eli Zaretskii
       [not found] ` <mailman.2443.1236279043.31690.help-gnu-emacs@gnu.org>
  2 siblings, 0 replies; 12+ messages in thread
From: Eli Zaretskii @ 2009-03-05 18:50 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Marko Myllymaki <firstname.lastname@iki.fi>
> Date: Thu, 05 Mar 2009 13:17:50 +0200
> 
> I usually work with UTF-8 encoded files and my system is set up for 
> UTF-8 (text may be in english, finnish, swedish, german, russian...)
> 
> However, sometimes I edit latin1 -encoded files and I would like to have 
> the encoding automatically kept that way when saving files. Now it seems 
> to change unwantedly to UTF-8 sometimes...

Emacs should do that only if you insert characters that cannot be
encoded with Latin-1.  If that is not the case, i.e. if you find that
Emacs changes encoding from Latin-1 to UTF-8 although _all_ the
characters are encodable in Latin-1, that should be a bug worth
reporting (with a reproducible test case).

> So... if I load UTF-8 encoded file, emacs always saves it that way. If I 
> open latin1-encoded file, it should keep it in the original encoding.

Yes, that's how it's supposed to work, if you don't let any
non-Latin-1 characters creep in.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to keep character encoding in text file...
       [not found] ` <mailman.2443.1236279043.31690.help-gnu-emacs@gnu.org>
@ 2009-03-06  7:39   ` Marko Myllymaki
  2009-03-06  8:21     ` Pascal J. Bourguignon
  2009-03-06 10:51     ` Eli Zaretskii
  0 siblings, 2 replies; 12+ messages in thread
From: Marko Myllymaki @ 2009-03-06  7:39 UTC (permalink / raw)
  To: help-gnu-emacs

Eli Zaretskii wrote:
>> So... if I load UTF-8 encoded file, emacs always saves it that way. If I 
>> open latin1-encoded file, it should keep it in the original encoding.
> 
> Yes, that's how it's supposed to work, if you don't let any
> non-Latin-1 characters creep in.

Okay, that might be the problem... because my system defaults to UTF-8, 
I guess that there is some keyboard input encoding in emacs which uses 
UTF-8.

Therefore if I enter "baz" in latin1 buffer, everything is okay, but 
"foobar åäö" has some UTF-8 and it then forces buffer encoding to UTF-8...

Maybe I could change input encoding depending on file encoding... hmm ;)

Or maybe I try to figure out how to make menu items which force buffer 
encoding and write the file.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to keep character encoding in text file...
  2009-03-05 13:33 ` Pascal J. Bourguignon
@ 2009-03-06  7:54   ` Marko Myllymaki
  0 siblings, 0 replies; 12+ messages in thread
From: Marko Myllymaki @ 2009-03-06  7:54 UTC (permalink / raw)
  To: help-gnu-emacs

Pascal J. Bourguignon wrote:
> The easiest way is to put a comment on the first two lines containing:
> -*- coding:iso-8859-1 -*-

Thanks! I might try this also in the source files.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to keep character encoding in text file...
  2009-03-06  7:39   ` Marko Myllymaki
@ 2009-03-06  8:21     ` Pascal J. Bourguignon
  2009-03-06  9:44       ` Peter Dyballa
                         ` (3 more replies)
  2009-03-06 10:51     ` Eli Zaretskii
  1 sibling, 4 replies; 12+ messages in thread
From: Pascal J. Bourguignon @ 2009-03-06  8:21 UTC (permalink / raw)
  To: help-gnu-emacs

Marko Myllymaki <firstname.lastname@iki.fi> writes:
> Therefore if I enter "baz" in latin1 buffer, everything is okay, but
> "foobar åäö" has some UTF-8 and it then forces buffer encoding to
> UTF-8...

Perhaps because your keyboard doesn't send compound characters, but
unicode combination sequences?  iso-8859-1 contains acute and grave
accents, but no umlaut.


-- 
__Pascal Bourguignon__


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to keep character encoding in text file...
  2009-03-06  8:21     ` Pascal J. Bourguignon
@ 2009-03-06  9:44       ` Peter Dyballa
  2009-03-06 10:13         ` Nikolaj Schumacher
       [not found]       ` <mailman.2491.1236332716.31690.help-gnu-emacs@gnu.org>
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Peter Dyballa @ 2009-03-06  9:44 UTC (permalink / raw)
  To: Pascal J. Bourguignon; +Cc: help-gnu-emacs


Am 06.03.2009 um 09:21 schrieb Pascal J. Bourguignon:

> Marko Myllymaki <firstname.lastname@iki.fi> writes:
>> Therefore if I enter "baz" in latin1 buffer, everything is okay, but
>> "foobar åäö" has some UTF-8 and it then forces buffer encoding to
>> UTF-8...
>
> Perhaps because your keyboard doesn't send compound characters, but
> unicode combination sequences?  iso-8859-1 contains acute and grave
> accents, but no umlaut.
>

You're wrong:

Ä = 304 = 196 = C4 = U+00C4 =    C3 84 : LATIN CAPITAL LETTER A WITH  
DIAERESIS
Ë = 313 = 203 = CB = U+00CB =    C3 8B : LATIN CAPITAL LETTER E WITH  
DIAERESIS
Ï = 317 = 207 = CF = U+00CF =    C3 8F : LATIN CAPITAL LETTER I WITH  
DIAERESIS
Ö = 326 = 214 = D6 = U+00D6 =    C3 96 : LATIN CAPITAL LETTER O WITH  
DIAERESIS
Ü = 334 = 220 = DC = U+00DC =    C3 9C : LATIN CAPITAL LETTER U WITH  
DIAERESIS
ä = 344 = 228 = E4 = U+00E4 =    C3 A4 : LATIN SMALL LETTER A WITH  
DIAERESIS
ë = 353 = 235 = EB = U+00EB =    C3 AB : LATIN SMALL LETTER E WITH  
DIAERESIS
ï = 357 = 239 = EF = U+00EF =    C3 AF : LATIN SMALL LETTER I WITH  
DIAERESIS
ö = 366 = 246 = F6 = U+00F6 =    C3 B6 : LATIN SMALL LETTER O WITH  
DIAERESIS
ü = 374 = 252 = FC = U+00FC =    C3 BC : LATIN SMALL LETTER U WITH  
DIAERESIS
ÿ = 377 = 255 = FF = U+00FF =    C3 BF : LATIN SMALL LETTER Y WITH  
DIAERESIS

(Although in German only ä, ö, ü, Ä, Ö, and Ü are accepted as umlauts.)

--
Greetings

   Pete

A TRUE Klingon warrior does not comment his code.





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to keep character encoding in text file...
       [not found]       ` <mailman.2491.1236332716.31690.help-gnu-emacs@gnu.org>
@ 2009-03-06 10:12         ` Pascal J. Bourguignon
  0 siblings, 0 replies; 12+ messages in thread
From: Pascal J. Bourguignon @ 2009-03-06 10:12 UTC (permalink / raw)
  To: help-gnu-emacs

Peter Dyballa <Peter_Dyballa@Web.DE> writes:

> Am 06.03.2009 um 09:21 schrieb Pascal J. Bourguignon:
>
>> Marko Myllymaki <firstname.lastname@iki.fi> writes:
>>> Therefore if I enter "baz" in latin1 buffer, everything is okay, but
>>> "foobar åäö" has some UTF-8 and it then forces buffer encoding to
>>> UTF-8...
>>
>> Perhaps because your keyboard doesn't send compound characters, but
>> unicode combination sequences?  iso-8859-1 contains acute and grave
>> accents, but no umlaut.
>>
>
> You're wrong:
>
> Ä = 304 = 196 = C4 = U+00C4 =    C3 84 : LATIN CAPITAL LETTER A WITH
> DIAERESIS

I meant the character #x000A8     168  ¨  "DIAERESIS"
and indeed I was wrong, I searched umlaut, but it's called diaeresis.


-- 
__Pascal Bourguignon__


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to keep character encoding in text file...
  2009-03-06  9:44       ` Peter Dyballa
@ 2009-03-06 10:13         ` Nikolaj Schumacher
  0 siblings, 0 replies; 12+ messages in thread
From: Nikolaj Schumacher @ 2009-03-06 10:13 UTC (permalink / raw)
  To: Peter Dyballa; +Cc: Pascal J. Bourguignon, help-gnu-emacs

Peter Dyballa <Peter_Dyballa@Web.DE> wrote:

> Am 06.03.2009 um 09:21 schrieb Pascal J. Bourguignon:
>
>> Marko Myllymaki <firstname.lastname@iki.fi> writes:
>>> Therefore if I enter "baz" in latin1 buffer, everything is okay, but
>>> "foobar åäö" has some UTF-8 and it then forces buffer encoding to
>>> UTF-8...
>>
>> Perhaps because your keyboard doesn't send compound characters, but
>> unicode combination sequences?  iso-8859-1 contains acute and grave
>> accents, but no umlaut.
>>
>
> You're wrong:
>
> Ä = 304 = 196 = C4 = U+00C4 =    C3 84 : LATIN CAPITAL LETTER A WITH

He meant there is no umlaut combining character, such as 0x0308 in
unicode.  iso-8859-1 contains ` and ´ as individual chars.  But I don't
think those are combining characters, anyway.  If the keyboard has
"dead keys", that's an OS preference and not represented in the character set.


regards,
Nikolaj Schumacher




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to keep character encoding in text file...
  2009-03-06  7:39   ` Marko Myllymaki
  2009-03-06  8:21     ` Pascal J. Bourguignon
@ 2009-03-06 10:51     ` Eli Zaretskii
  1 sibling, 0 replies; 12+ messages in thread
From: Eli Zaretskii @ 2009-03-06 10:51 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Marko Myllymaki <firstname.lastname@iki.fi>
> Date: Fri, 06 Mar 2009 09:39:56 +0200
> 
> Eli Zaretskii wrote:
> >> So... if I load UTF-8 encoded file, emacs always saves it that way. If I 
> >> open latin1-encoded file, it should keep it in the original encoding.
> > 
> > Yes, that's how it's supposed to work, if you don't let any
> > non-Latin-1 characters creep in.
> 
> Okay, that might be the problem... because my system defaults to UTF-8, 
> I guess that there is some keyboard input encoding in emacs which uses 
> UTF-8.

No, that shouldn't in itself be a problem, as long as the characters
your keyboard input inserts are encodable in Latin-1.

When Emacs receives keyboard input, it first decodes all the
characters into its internal representation.  After such decoding, it
no longer matters how the characters were transmitted to Emacs, only
if they have a valid encoding in Latin-1.

In other words, keyboard input encoding has no direct relation to when
Emacs decides that the original file's encoding cannot be used to save
the modified buffer.

> Therefore if I enter "baz" in latin1 buffer, everything is okay, but 
> "foobar åäö" has some UTF-8 and it then forces buffer encoding to UTF-8...

All 3 characters you cited above are encodable in Latin-1, so this
cannot be the problem.  You need to find out (e.g., by using the 
"C-u C-x =" command on suspect characters) which one of the characters
you typed cannot be encoded in Latin-1.

Which Emacs version is that, by the way?





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to keep character encoding in text file...
  2009-03-06  8:21     ` Pascal J. Bourguignon
  2009-03-06  9:44       ` Peter Dyballa
       [not found]       ` <mailman.2491.1236332716.31690.help-gnu-emacs@gnu.org>
@ 2009-03-06 10:57       ` Peter Dyballa
       [not found]       ` <mailman.2504.1236337060.31690.help-gnu-emacs@gnu.org>
  3 siblings, 0 replies; 12+ messages in thread
From: Peter Dyballa @ 2009-03-06 10:57 UTC (permalink / raw)
  To: Emacs Help List


> Marko Myllymaki writes:
>> Therefore if I enter "baz" in latin1 buffer, everything is okay, but
>> "foobar åäö" has some UTF-8 and it then forces buffer encoding to
>> UTF-8...


How do you produce the accented characters?

What is shown in *Help* buffer when you position the cursor on each  
of åäö and type on each C-u C-x =?

What is inserted, for example in *scratch* buffer, when you type C-q  
<any of the three åäö>?

--
Greetings

   Pete

      _o    o         o   o
    _<<     \\_/\_,   \\_ \\_/\_,
   (*)/(*) (*)   (*) (*) `-    (*)






^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to keep character encoding in text file...
       [not found]       ` <mailman.2504.1236337060.31690.help-gnu-emacs@gnu.org>
@ 2009-03-07  9:05         ` Marko Myllymaki
  0 siblings, 0 replies; 12+ messages in thread
From: Marko Myllymaki @ 2009-03-07  9:05 UTC (permalink / raw)
  To: help-gnu-emacs

Peter Dyballa wrote:
> What is shown in *Help* buffer when you position the cursor on each of 
> åäö and type on each C-u C-x =?

   character: å (2277, #o4345, #x8e5, U+00E5)
     charset: latin-iso8859-1
              (Right-Hand Part of Latin Alphabet 1 (ISO/IEC 8859-1): 
ISO-IR-100.)
  code point: #x65
      syntax: w 	which means: word
    category: l:Latin
buffer code: #x81 #xE5
   file code: #xE5 (encoded by coding system iso-latin-1-unix)
     display: by this font (glyph code)
      -Misc-Fixed-Bold-R-Normal--13-120-75-75-C-80-ISO8859-1 (#xE5)

Okay, guess I was wrong. I know much more about emacs char encoding now.

I try now to reproduce the problem (if it appears again)... Maybe it was 
emacs-snapshot (v 23) which caused the problems... I changed back to 22.

Even pasting utf-8 text to latin1 -buffer keeps the buffer encoding. Good!

Hopefully the problem does not reoccur when I'm very busy working :D


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-03-07  9:05 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-05 11:17 How to keep character encoding in text file Marko Myllymaki
2009-03-05 13:33 ` Pascal J. Bourguignon
2009-03-06  7:54   ` Marko Myllymaki
2009-03-05 18:50 ` Eli Zaretskii
     [not found] ` <mailman.2443.1236279043.31690.help-gnu-emacs@gnu.org>
2009-03-06  7:39   ` Marko Myllymaki
2009-03-06  8:21     ` Pascal J. Bourguignon
2009-03-06  9:44       ` Peter Dyballa
2009-03-06 10:13         ` Nikolaj Schumacher
     [not found]       ` <mailman.2491.1236332716.31690.help-gnu-emacs@gnu.org>
2009-03-06 10:12         ` Pascal J. Bourguignon
2009-03-06 10:57       ` Peter Dyballa
     [not found]       ` <mailman.2504.1236337060.31690.help-gnu-emacs@gnu.org>
2009-03-07  9:05         ` Marko Myllymaki
2009-03-06 10:51     ` Eli Zaretskii

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).