From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#13505: Bug#696026: emacs24: file corruption on saving Date: Mon, 21 Jan 2013 19:55:20 +0200 Message-ID: <83vcaqo1yv.fsf@gnu.org> References: <20121215223809.GA7549@xvii.vinc17.org> <877gn8ijgn.fsf@trouble.defaultvalue.org> <83obgjpzod.fsf@gnu.org> <20130120212508.GF2695@xvii.vinc17.org> <83bocjpm81.fsf@gnu.org> <20130120221007.GG2695@xvii.vinc17.org> <83a9s3p56p.fsf@gnu.org> <20130121041410.GJ2695@xvii.vinc17.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1358790964 4990 80.91.229.3 (21 Jan 2013 17:56:04 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 21 Jan 2013 17:56:04 +0000 (UTC) Cc: 696026-forwarded@bugs.debian.org, 696026@bugs.debian.org, rlb@defaultvalue.org, 13505@debbugs.gnu.org To: Vincent Lefevre Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Mon Jan 21 18:56:21 2013 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1TxLbY-0002br-Pl for geb-bug-gnu-emacs@m.gmane.org; Mon, 21 Jan 2013 18:56:21 +0100 Original-Received: from localhost ([::1]:50781 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TxLbH-0000TQ-RK for geb-bug-gnu-emacs@m.gmane.org; Mon, 21 Jan 2013 12:56:03 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:47740) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TxLbD-0000SM-Cw for bug-gnu-emacs@gnu.org; Mon, 21 Jan 2013 12:56:01 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TxLbC-0004Jx-1f for bug-gnu-emacs@gnu.org; Mon, 21 Jan 2013 12:55:59 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:37924) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TxLbB-0004Jt-UL for bug-gnu-emacs@gnu.org; Mon, 21 Jan 2013 12:55:57 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.72) (envelope-from ) id 1TxLcE-0002G5-7i for bug-gnu-emacs@gnu.org; Mon, 21 Jan 2013 12:57:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 21 Jan 2013 17:57:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13505 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 13505-submit@debbugs.gnu.org id=B13505.13587910138664 (code B ref 13505); Mon, 21 Jan 2013 17:57:02 +0000 Original-Received: (at 13505) by debbugs.gnu.org; 21 Jan 2013 17:56:53 +0000 Original-Received: from localhost ([127.0.0.1]:43388 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TxLc3-0002Fg-So for submit@debbugs.gnu.org; Mon, 21 Jan 2013 12:56:53 -0500 Original-Received: from mtaout22.012.net.il ([80.179.55.172]:64521) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TxLc0-0002FU-S2 for 13505@debbugs.gnu.org; Mon, 21 Jan 2013 12:56:50 -0500 Original-Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0MGZ00300LQ3W100@a-mtaout22.012.net.il> for 13505@debbugs.gnu.org; Mon, 21 Jan 2013 19:55:05 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MGZ003GJLRSL9A0@a-mtaout22.012.net.il>; Mon, 21 Jan 2013 19:55:05 +0200 (IST) In-reply-to: <20130121041410.GJ2695@xvii.vinc17.org> X-012-Sender: halo1@inter.net.il X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:70142 Archived-At: > Date: Mon, 21 Jan 2013 05:14:10 +0100 > From: Vincent Lefevre > Cc: rlb@defaultvalue.org, handa@gnu.org, 13505@debbugs.gnu.org, > 696026-forwarded@bugs.debian.org, 696026@bugs.debian.org > > On 2013-01-21 05:48:14 +0200, Eli Zaretskii wrote: > > > You said: > > > > > > | The original encoded form of the characters as found on disk at > > > | visit time _cannot_ be recovered by saving with raw-text, because > > > | that encoded form is lost without a trace when the file is _visited_ > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > | and decoded into the internal representation. > > > > > > This is what lossy is. > > > > In that sense, every encoding except no-conversion is lossy. > > Even 8-bit encodings such as latin-1? Yes. When latin-1 characters are decoded (as part of visiting a file), they are converted to the internal representation, and cease to be single 8-bit bytes. > > > On the opposite, the utf-8 encoding doesn't seem to be lossy: Emacs > > > seems to handle files with invalid UTF-8 sequences without any loss. > > > So, this encoding is safe, even if Emacs wrongly guess the encoding. > > > > No, it isn't, although you could get away with it most of the time. > > Could you give an example where one loses data with the utf-8 encoding? E.g., in your test file, the byte whose value is 0x80 is converted to 0x3fff80 when the file is read into a buffer. Perhaps by "lossless" you mean "reversible", in the sense that saving the same buffer will perform the reverse conversion. In that case, even the in-is13194-devanagari-unix is reversible: if you type this encoding when Emacs prompts you to select one of the coding systems, then you get the same file on disk with no corruption whatsoever. > > > But Emacs should clearly tell the user what to do after C-x C-s and > > > clearly say when there can be data loss. > > > > At save time, "data loss" is wrt what's in the buffer. In that sense, > > the encodings Emacs suggested don't lose any data. > > "data loss" is the difference between the original file and the saved > file. But what do you want Emacs to do with this? When you save the buffer, the original file might be different or no longer be available (or not accessible even in principle, e.g. if the data came from a subprocess). These issues should be detected at file visit time, if at all, not at buffer save time. > > > Then Emacs says: "Select one of the safe coding systems listed below > > > [...]", but doesn't say that something has already been lost. So, the > > > words "safe coding systems" are really misleading. > > > > It's misleading because you misunderstand what is "safe" at buffer > > save time. > > No, it's misleading because Emacs didn't say that data were lost > when visiting the file. Let's be constructive here. Please suggest some practical way for Emacs to handle this situation better. For the record, here are the various alternative ways Emacs supports the use case you described, when a file with inconsistent encoding needs to be repaired manually: . Visit the file with "M-x find-file-literally RET". This yields a unibyte buffer, where each byte stands for itself, and which you can edit without risking en-/decoding issues. . Visit the file normally, then type "M-x hexl-mode RET" (or use "M-x hexl-find-file RET" to visit it in the first place). This revisits (or visits) the file in a unibyte buffer, and in addition lets you edit the binary stuff regardless of its graphic representation. . After visiting the file normally and noticing that it contains weird characters, or after being prompted to select a coding system when saving the buffer, type "C-x RET r raw-text RET" to revisit the file in raw-text encoding. Then edit the bytes and save the file. These alternatives are listed in the descending order of priority (IMO). There are more ways to deal with this, but the rest are more complicated and dangerous, so I don't mention them here. (It is also possible that you will find the second alternative more convenient than the 1st one.)