unofficial mirror of help-gnu-emacs@gnu.org
 help / color / mirror / Atom feed
* non-ASCII characters in auto-save files
@ 2012-06-25 12:46 grivet
  2012-06-25 16:56 ` Peter Dyballa
       [not found] ` <mailman.3451.1340643405.855.help-gnu-emacs@gnu.org>
  0 siblings, 2 replies; 12+ messages in thread
From: grivet @ 2012-06-25 12:46 UTC (permalink / raw)
  To: help-gnu-emacs

[-- Attachment #1: Type: text/plain, Size: 1176 bytes --]

Hello,
I write a lot of text under WinXP/Emacs/Latex, in French. For some 
unknown reason, emacs sometimes crashes and
I am left with an auto-save file, say #myfile#.
My problem is that, in such a file, all non ASCII characters are 
mangled. I can open #myfile# in Emacs, but accented e's, for instance,
appear as \303\250 or as \201\250 and similarly for all accented letters.
If I try to save #myfile# to disk, Emacs telles me it cannot encode 
these characters. Here is the answer I get when I do C-uC-x = on one
of the offending chars:

c-u c-x =

         character:   (4194216, #o17777650, #x3fffa8)
preferred charset: eight-bit (Raw bytes 128-255)
        code point: 0xA8
            syntax: w     which means: word
       buffer code: #xA8
         file code: not encodable by coding system emacs-mule-unix
           display: no font available

I have two questions:
     - Can Emacs be persuaded to to encode auto-save files in a more 
useful manner ?
     - Is there a systematic way to convert #myfile# to iso-latin-9 (or 
iso-latin-1) code, other than painstakingly
searching and replacing offending chars ?

Thanks in advance for your time and help
JP Grivet



[-- Attachment #2: Type: text/html, Size: 2352 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: non-ASCII characters in auto-save files
  2012-06-25 12:46 non-ASCII characters in auto-save files grivet
@ 2012-06-25 16:56 ` Peter Dyballa
       [not found] ` <mailman.3451.1340643405.855.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 12+ messages in thread
From: Peter Dyballa @ 2012-06-25 16:56 UTC (permalink / raw)
  To: grivet; +Cc: help-gnu-emacs


Am 25.06.2012 um 14:46 schrieb grivet:

> I have two questions:
>    - Can Emacs be persuaded to to encode auto-save files in a more useful manner ?
>    - Is there a systematic way to convert #myfile# to iso-latin-9 (or iso-latin-1) code, other than painstakingly
> searching and replacing offending chars ?

I have one answer: use file-local variables!

For example in the header:

	%%% -*- mode: LaTeX; coding: iso-latin-9-unix; -*-

Or in the (AUCTeX) footer:

	%%% Local Variables:
	%%% mode: LaTeX
	%%% fill-column: 99999
	%%% coding: iso-latin-9
	%%% End:
	%

In the header you can combine the coding line with a time-stamp line, updated every time you save the file:

	%%%	Time-stamp: <2012-01-15 16:41:38 pete> 


With C-x RET c <encoding name> RET you can set an encoding to be used when you read and open the auto-save file. You can use the same command to set an encoding for saving the file.

--
Greetings

  Pete

There's no place like ~
			– (UNIX Guru)




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: non-ASCII characters in auto-save files
       [not found] ` <mailman.3451.1340643405.855.help-gnu-emacs@gnu.org>
@ 2012-06-25 22:40   ` Michael Heerdegen
  2012-06-25 23:11     ` Peter Dyballa
  0 siblings, 1 reply; 12+ messages in thread
From: Michael Heerdegen @ 2012-06-25 22:40 UTC (permalink / raw)
  To: help-gnu-emacs

Peter Dyballa <Peter_Dyballa@Web.DE> writes:

> Am 25.06.2012 um 14:46 schrieb grivet:
>
> > I have two questions:
> >    - Can Emacs be persuaded to to encode auto-save files in a more
> > useful manner ?
> >    - Is there a systematic way to convert #myfile# to iso-latin-9
> > (or iso-latin-1) code, other than painstakingly
> > searching and replacing offending chars ?
>
> I have one answer: use file-local variables!
>
> For example in the header:
>
> 	%%% -*- mode: LaTeX; coding: iso-latin-9-unix; -*-
>
> Or in the (AUCTeX) footer:
>
> 	%%% Local Variables:
> 	%%% mode: LaTeX
> 	%%% fill-column: 99999
> 	%%% coding: iso-latin-9
> 	%%% End:
> 	%

I'm no expert here, but - that doesn't work for autosave files (try
it!).  Looking at the code, it seems that autosaving handles coding
specially.  File local variables are respected when finding the autosave
file, but it is saved with a different encoding then the base file.

But grivet: Why do you have to open those autosave files?  I think the
standard way to use them is via `recover-file' or `recover-session'.
You shouldn't use them directly.


Michael.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: non-ASCII characters in auto-save files
  2012-06-25 22:40   ` Michael Heerdegen
@ 2012-06-25 23:11     ` Peter Dyballa
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Dyballa @ 2012-06-25 23:11 UTC (permalink / raw)
  To: Michael Heerdegen; +Cc: help-gnu-emacs


Am 26.06.2012 um 00:40 schrieb Michael Heerdegen:

> I'm no expert here, but - that doesn't work for autosave files (try
> it!).  Looking at the code, it seems that autosaving handles coding
> specially.  File local variables are respected when finding the autosave
> file, but it is saved with a different encoding then the base file.

That's right! File-local variables can only prevent encoding ambiguities. (With auto-save and other files.)

--
Mit friedvollen Grüßen

  Pete

The wise man said: "Never argue with an idiot. They bring you down to their level and beat you with experience."







^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: non-ASCII characters in auto-save files
       [not found] <mailman.3448.1340636607.855.help-gnu-emacs@gnu.org>
@ 2012-06-25 23:21 ` Stefan Monnier
  2012-06-26 10:13   ` grivet
  2012-06-27 11:52   ` Xah Lee
  0 siblings, 2 replies; 12+ messages in thread
From: Stefan Monnier @ 2012-06-25 23:21 UTC (permalink / raw)
  To: help-gnu-emacs

>     - Can Emacs be persuaded to to encode auto-save files in a more useful
> manner ?

No.  Auto-save files are encoded using Emacs's internal coding-system,
which makes the operation a lot more reliable and efficient (no need to
worry about the case where some char(s) can't be encoded, for example).

>     - Is there a systematic way to convert #myfile# to iso-latin-9 (or
> iso-latin-1) code, other than painstakingly searching and replacing
> offending chars ?

Just open the file with the right coding-system (typically, opening
"myfile" and then hitting M-x recover-this-file RET should do the
trick).


        Stefan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: non-ASCII characters in auto-save files
  2012-06-25 23:21 ` Stefan Monnier
@ 2012-06-26 10:13   ` grivet
  2012-06-26 15:40     ` Eli Zaretskii
  2012-06-27 11:52   ` Xah Lee
  1 sibling, 1 reply; 12+ messages in thread
From: grivet @ 2012-06-26 10:13 UTC (permalink / raw)
  To: help-gnu-emacs


     - Is there a systematic way to convert #myfile# to iso-latin-9 (or
iso-latin-1) code, other than painstakingly searching and replacing
offending chars ?

> Just open the file with the right coding-system (typically, opening
> "myfile" and then hitting M-x recover-this-file RET should do the
> trick).
>
>
>          Stefan
#myfile# c&n be opened in this way but cannot be saved, as the coding 
system is not recognized.
JP Grivet





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: non-ASCII characters in auto-save files
  2012-06-26 10:13   ` grivet
@ 2012-06-26 15:40     ` Eli Zaretskii
  0 siblings, 0 replies; 12+ messages in thread
From: Eli Zaretskii @ 2012-06-26 15:40 UTC (permalink / raw)
  To: help-gnu-emacs

> Date: Tue, 26 Jun 2012 12:13:21 +0200
> From: grivet <grivet@cnrs-orleans.fr>
> 
> 
>      - Is there a systematic way to convert #myfile# to iso-latin-9 (or
> iso-latin-1) code, other than painstakingly searching and replacing
> offending chars ?

Yes.  

   C-x RET c emacs-internal RET C-x C-f #myfile# RET
   C-x RET f latin-9 RET
   C-x C-s

> > Just open the file with the right coding-system (typically, opening
> > "myfile" and then hitting M-x recover-this-file RET should do the
> > trick).
> >
> >
> >          Stefan
> #myfile# c&n be opened in this way but cannot be saved, as the coding 
> system is not recognized.

You should open myfile, not #myfile#.  Then Emacs will automatically
notice that the auto-save file for it exists, and offer to recover it.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: non-ASCII characters in auto-save files
  2012-06-25 23:21 ` Stefan Monnier
  2012-06-26 10:13   ` grivet
@ 2012-06-27 11:52   ` Xah Lee
  2012-06-27 16:44     ` Eli Zaretskii
       [not found]     ` <mailman.3571.1340815454.855.help-gnu-emacs@gnu.org>
  1 sibling, 2 replies; 12+ messages in thread
From: Xah Lee @ 2012-06-27 11:52 UTC (permalink / raw)
  To: help-gnu-emacs

On Jun 25, 4:21 pm, Stefan Monnier <monn...@iro.umontreal.ca> wrote:
> >     - Can Emacs be persuaded to to encode auto-save files in a more useful
> > manner ?
>
> No.  Auto-save files are encoded using Emacs's internal coding-system,
> which makes the operation a lot more reliable and efficient (no need to
> worry about the case where some char(s) can't be encoded, for example).


i thought emacs 23/24 uses a superset of utf-8 as internal encoding?
so that i thought those #autosave# files are encoded in utf-8, and as
long as one open them with utf-8 decoding all chars would show. Is
this not correct?

(i turned of auto-save and auto backup few years ago)

 Xah


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: non-ASCII characters in auto-save files
  2012-06-27 11:52   ` Xah Lee
@ 2012-06-27 16:44     ` Eli Zaretskii
       [not found]     ` <mailman.3571.1340815454.855.help-gnu-emacs@gnu.org>
  1 sibling, 0 replies; 12+ messages in thread
From: Eli Zaretskii @ 2012-06-27 16:44 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Xah Lee <xahlee@gmail.com>
> Date: Wed, 27 Jun 2012 04:52:41 -0700 (PDT)
> 
> i thought emacs 23/24 uses a superset of utf-8 as internal encoding?

It does.

> so that i thought those #autosave# files are encoded in utf-8, and as
> long as one open them with utf-8 decoding all chars would show. Is
> this not correct?

Not 100%, because it's a superset.  That is, there could be codes
there that UTF-8 will fail to decode.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: non-ASCII characters in auto-save files
       [not found]     ` <mailman.3571.1340815454.855.help-gnu-emacs@gnu.org>
@ 2012-06-27 20:03       ` Xah Lee
  2012-06-28  1:07         ` Stefan Monnier
  0 siblings, 1 reply; 12+ messages in thread
From: Xah Lee @ 2012-06-27 20:03 UTC (permalink / raw)
  To: help-gnu-emacs

On Jun 27, 9:44 am, Eli Zaretskii <e...@gnu.org> wrote:
> > From: Xah Lee <xah...@gmail.com>
> > Date: Wed, 27 Jun 2012 04:52:41 -0700 (PDT)
>
> > i thought emacs 23/24 uses a superset of utf-8 as internal encoding?
>
> It does.

hi Eli, thanks for confirmation.

> > so that i thought those #autosave# files are encoded in utf-8, and as
> > long as one open them with utf-8 decoding all chars would show. Is
> > this not correct?
>
> Not 100%, because it's a superset.  That is, there could be codes
> there that UTF-8 will fail to decode.

that's interesting. What's some examples? e.g. What chars in emacs's
encoding that's different from utf-8 encoding? I've been wondering why
it needs to be a superset.

 Xah


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: non-ASCII characters in auto-save files
  2012-06-27 20:03       ` Xah Lee
@ 2012-06-28  1:07         ` Stefan Monnier
  2012-06-28  2:58           ` Eli Zaretskii
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Monnier @ 2012-06-28  1:07 UTC (permalink / raw)
  To: help-gnu-emacs

> that's interesting. What's some examples? e.g. What chars in emacs's
> encoding that's different from utf-8 encoding? I've been wondering why
> it needs to be a superset.

The most obvious unavoidable ones are the chars that represent bytes.
So if you read a utf-8 file which has some invalid byte sequence, Emacs
will be able to save it back unchanged (in the buffer, the invalid
sequence is represented by the corresponding sequence of "byte chars").


        Stefan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: non-ASCII characters in auto-save files
  2012-06-28  1:07         ` Stefan Monnier
@ 2012-06-28  2:58           ` Eli Zaretskii
  0 siblings, 0 replies; 12+ messages in thread
From: Eli Zaretskii @ 2012-06-28  2:58 UTC (permalink / raw)
  To: help-gnu-emacs

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Wed, 27 Jun 2012 21:07:07 -0400
> 
> > that's interesting. What's some examples? e.g. What chars in emacs's
> > encoding that's different from utf-8 encoding? I've been wondering why
> > it needs to be a superset.
> 
> The most obvious unavoidable ones are the chars that represent bytes.

Yes, and then there are CJK characters that are not unified with their
Unicode codepoints, for historical and cultural reasons.  There's more
detailed description in the ELisp manual, node "Text Representations".



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-06-28  2:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-25 12:46 non-ASCII characters in auto-save files grivet
2012-06-25 16:56 ` Peter Dyballa
     [not found] ` <mailman.3451.1340643405.855.help-gnu-emacs@gnu.org>
2012-06-25 22:40   ` Michael Heerdegen
2012-06-25 23:11     ` Peter Dyballa
     [not found] <mailman.3448.1340636607.855.help-gnu-emacs@gnu.org>
2012-06-25 23:21 ` Stefan Monnier
2012-06-26 10:13   ` grivet
2012-06-26 15:40     ` Eli Zaretskii
2012-06-27 11:52   ` Xah Lee
2012-06-27 16:44     ` Eli Zaretskii
     [not found]     ` <mailman.3571.1340815454.855.help-gnu-emacs@gnu.org>
2012-06-27 20:03       ` Xah Lee
2012-06-28  1:07         ` Stefan Monnier
2012-06-28  2:58           ` Eli Zaretskii

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).