unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Vincent Lefevre <vincent@vinc17.net>
To: Eli Zaretskii <eliz@gnu.org>
Cc: 696026-forwarded@bugs.debian.org, 696026@bugs.debian.org,
	rlb@defaultvalue.org, 13505@debbugs.gnu.org
Subject: bug#13505: Bug#696026: emacs24: file corruption on saving
Date: Sun, 20 Jan 2013 23:10:08 +0100	[thread overview]
Message-ID: <20130120221007.GG2695@xvii.vinc17.org> (raw)
In-Reply-To: <83bocjpm81.fsf@gnu.org>

On 2013-01-20 23:40:14 +0200, Eli Zaretskii wrote:
> > Date: Sun, 20 Jan 2013 22:25:08 +0100
> > From: Vincent Lefevre <vincent@vinc17.net>
> > Cc: Rob Browning <rlb@defaultvalue.org>, Kenichi Handa <handa@gnu.org>,
> > 	13505@debbugs.gnu.org, 696026-forwarded@bugs.debian.org,
> > 	696026@bugs.debian.org
> > 
> > On 2013-01-20 18:49:38 +0200, Eli Zaretskii wrote:
> > > Personally, I don't think there's a bug here.  It's a cockpit error.
> > 
> > Perhaps it isn't a bug at save time. But then, selecting a lossy
> > encoding by default when visiting the file is the bug (and really
> > a regression), particularly if this isn't clearly told to the user.
> 
> The encoding isn't lossy.

You said:

| The original encoded form of the characters as found on disk at
| visit time _cannot_ be recovered by saving with raw-text, because
| that encoded form is lost without a trace when the file is _visited_
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| and decoded into the internal representation.

This is what lossy is.

On the opposite, the utf-8 encoding doesn't seem to be lossy: Emacs
seems to handle files with invalid UTF-8 sequences without any loss.
So, this encoding is safe, even if Emacs wrongly guess the encoding.

> In any case, I don't really understand your proposal.  Suppose the
> file was indeed encoded in in-is13194-devanagari, would you argue then
> that selecting it would be incorrect or undesirable behavior?

If Emacs modifies the contents when saving the file, it would be
incorrect.

> > Actually this is related, since the lossy encoding becomes a real
> > problem only at save time (and for copy-paste I assume, though the
> > file doesn't get overwritten by that).
> 
> It is only a problem when you try to save or otherwise output it
> (e.g., send in an email).
> 
> But what you should do then is "C-x RET r raw-text RET", and recover.
> That is the only way to avoid corruption in files that use
> inconsistent encoding.

But Emacs should clearly tell the user what to do after C-x C-s and
clearly say when there can be data loss. Currently it says:

"These default coding systems were tried to encode text
in the buffer `file1':
  (in-is13194-devanagari-unix (2 . 2376) (3 . 4194176) (4 . 4194201)
  (5 . 2341) (6 . 2314) (12 . 2364)) (utf-8-unix (3 . 4194176) (4 .
  4194201))
However, each of them encountered characters it couldn't encode:
  in-is13194-devanagari-unix cannot encode these: [...]
  utf-8-unix cannot encode these: [...]"

This shouldn't be regarded as a problem by the user, because if Emacs
could read and interpret the file (and such characters have not been
added by the user), it should be able to save it.

Then Emacs says: "Select one of the safe coding systems listed below
[...]", but doesn't say that something has already been lost. So, the
words "safe coding systems" are really misleading.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)





  reply	other threads:[~2013-01-20 22:10 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20121215223809.GA7549@xvii.vinc17.org>
2013-01-20  4:09 ` bug#13505: Bug#696026: emacs24: file corruption on saving Rob Browning
2013-01-20 16:49   ` Eli Zaretskii
2013-01-20 17:31     ` Rob Browning
2013-01-20 20:24     ` Glenn Morris
2013-01-20 21:25     ` Vincent Lefevre
2013-01-20 21:40       ` Eli Zaretskii
2013-01-20 22:10         ` Vincent Lefevre [this message]
2013-01-20 22:22           ` Vincent Lefevre
2013-01-21  3:49             ` Eli Zaretskii
2013-01-21  3:48           ` Eli Zaretskii
2013-01-21  4:14             ` Vincent Lefevre
2013-01-21 17:55               ` Eli Zaretskii
2013-01-22  2:35                 ` Vincent Lefevre
2013-01-22  7:56                   ` Eli Zaretskii
2013-01-20 23:01     ` Andreas Schwab
2013-01-20 23:27       ` bug#13505: Bug#696026: " Rob Browning

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130120221007.GG2695@xvii.vinc17.org \
    --to=vincent@vinc17.net \
    --cc=13505@debbugs.gnu.org \
    --cc=696026-forwarded@bugs.debian.org \
    --cc=696026@bugs.debian.org \
    --cc=eliz@gnu.org \
    --cc=rlb@defaultvalue.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).