From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#13505: Bug#696026: emacs24: file corruption on saving Date: Sun, 20 Jan 2013 18:49:38 +0200 Message-ID: <83obgjpzod.fsf@gnu.org> References: <20121215223809.GA7549@xvii.vinc17.org> <877gn8ijgn.fsf@trouble.defaultvalue.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org X-Trace: ger.gmane.org 1358700611 27641 80.91.229.3 (20 Jan 2013 16:50:11 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Sun, 20 Jan 2013 16:50:11 +0000 (UTC) Cc: 696026-forwarded@bugs.debian.org, vincent@vinc17.net, 696026@bugs.debian.org, 13505@debbugs.gnu.org To: Rob Browning , Kenichi Handa Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Sun Jan 20 17:50:29 2013 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Twy6G-0007wm-8D for geb-bug-gnu-emacs@m.gmane.org; Sun, 20 Jan 2013 17:50:28 +0100 Original-Received: from localhost ([::1]:47203 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Twy5y-0005wt-HV for geb-bug-gnu-emacs@m.gmane.org; Sun, 20 Jan 2013 11:50:10 -0500 Original-Received: from eggs.gnu.org ([208.118.235.92]:55349) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Twy5u-0005tp-8J for bug-gnu-emacs@gnu.org; Sun, 20 Jan 2013 11:50:08 -0500 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Twy5s-0002fy-9G for bug-gnu-emacs@gnu.org; Sun, 20 Jan 2013 11:50:05 -0500 Original-Received: from debbugs.gnu.org ([140.186.70.43]:36255) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Twy5s-0002ep-71 for bug-gnu-emacs@gnu.org; Sun, 20 Jan 2013 11:50:04 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.72) (envelope-from ) id 1Twy6n-0005SR-MS for bug-gnu-emacs@gnu.org; Sun, 20 Jan 2013 11:51:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 20 Jan 2013 16:51:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13505 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 13505-submit@debbugs.gnu.org id=B13505.135870065320965 (code B ref 13505); Sun, 20 Jan 2013 16:51:01 +0000 Original-Received: (at 13505) by debbugs.gnu.org; 20 Jan 2013 16:50:53 +0000 Original-Received: from localhost ([127.0.0.1]:41719 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Twy6f-0005S6-6p for submit@debbugs.gnu.org; Sun, 20 Jan 2013 11:50:53 -0500 Original-Received: from mtaout21.012.net.il ([80.179.55.169]:47700) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Twy6c-0005Rx-LH for 13505@debbugs.gnu.org; Sun, 20 Jan 2013 11:50:51 -0500 Original-Received: from conversion-daemon.a-mtaout21.012.net.il by a-mtaout21.012.net.il (HyperSendmail v2007.08) id <0MGX00A00NY1R700@a-mtaout21.012.net.il> for 13505@debbugs.gnu.org; Sun, 20 Jan 2013 18:49:26 +0200 (IST) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout21.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MGX00A3LO2DP930@a-mtaout21.012.net.il>; Sun, 20 Jan 2013 18:49:25 +0200 (IST) In-reply-to: <877gn8ijgn.fsf@trouble.defaultvalue.org> X-012-Sender: halo1@inter.net.il X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:70091 Archived-At: > From: Rob Browning > Date: Sat, 19 Jan 2013 22:09:28 -0600 > Cc: 696026-forwarded@bugs.debian.org, Vincent Lefevre , > 696026@bugs.debian.org > > Vincent Lefevre writes: > > > Package: emacs24 > > Version: 24.2+1-1 > > Severity: grave > > Justification: causes non-serious data loss > > > > The file "file1" (attached) has the following contents: > > > > 00000000 6c e2 80 99 c3 a9 0a 74 65 73 74 e9 0a |l......test..| > > > > 1. Open "file1" with "emacs -Q". It is regarded as > > an in-is13194-devanagari-unix file. > > > > 2. Type M-: (set-buffer-modified-p t) to mark the buffer as modified > > (so that one can save it). > > > > 3. Save the file with C-x C-s. It is proposed: > > > > [...] > > Select one of the safe coding systems listed below, > > or cancel the writing with C-g and edit the buffer > > to remove or modify the problematic characters, > > or specify any other coding system (and risk losing > > the problematic characters). > > > > raw-text emacs-mule no-conversion > > > > 4. Choose raw-text (the default) or no-conversion. One can assume > > that the file will not be modified. But it gets corrupted: one > > obtains a file "file2" (attached) with the following contents: > > > > 00000000 6c e0 a5 88 80 99 e0 a4 a5 e0 a4 8a 0a 74 65 73 |l............tes| > > 00000010 74 e0 a4 bc 0a |t....| > > > > Note: Actually "file1" has mixed UTF-8 and ISO-8859-1 contents due to > > a user error. But due to this bug, an attempt to fix the problem with > > Emacs makes things even worse! BTW, I had the same problem in the past > > when attempting to edit an mbox file with Emacs (in this case, having > > mixed UTF-8 and ISO-8859-1 contents is normal). How Emacs interprets > > such contents doesn't matter, but by default, it mustn't corrupt the > > file on saving. > > > > There is no such problem with GNU Emacs 23.4.1 (Debian package > > emacs23 23.4+1-4). First, this isn't really a regression: Emacs 23 has the same "problem". It's just that Emacs 23 doesn't autodetect in-is13194-devanagari in this file, while Emacs 24 does. If you say "C-x RET c raw-text RET C-x C-f" to visit this file in Emacs 24, the problem will be gone, which is exactly what happens in Emacs 23, because it visits the file in raw-text to begin with. Conversely, if you use "C-x RET c in-is13194-devanagari RET C-x C-f" to visit the file in Emacs 23, you will get the same "problem" saving it. I didn't research the reason why Emacs 24 autodetects this encoding, and whether this is on purpose. Perhaps Handa-san could tell. More to the point: there seems to be a fundamental misunderstanding here regarding the effect of selecting an encoding at save time. It sounds like the OP thought that selecting a "literal" encoding, such as raw-text, which is supposed to leave the binary stream unaltered (apart of the EOL format), will ensure that a buffer will be saved exactly as it was originally found on disk. But this is false. What raw-text and no-conversion do is to write out the _internal_ representation of each character without any conversions. The original encoded form of the characters as found on disk at visit time _cannot_ be recovered by saving with raw-text, because that encoded form is lost without a trace when the file is _visited_ and decoded into the internal representation. The only information that's left is the coding-system used to decode the characters. But since the file's encoding in this case is inconsistent, that coding-system cannot be used to save it back (Emacs will not let you do so, as demonstrated in the report), and therefore the original form cannot be recovered this way. What the user should do to avoid this data loss is prevent the incorrect decoding of the file's contents when the file is visited. To this end, the file should be visited with no-conversion or raw-text, using "C-x RET c raw-text RET C-x C-f". Then it will be possible to repair the file and write it back using the same raw-text encoding. If the fact that the file's encoding is inconsistent is not realized until some time after the file is visited, the user should use "C-x RET r raw-text RET" to re-visit the file using raw-text. IOW, only selecting the appropriate encoding _at_visit_time_ can prevent data loss in these cases. The expectation that "Emacs mustn't corrupt the file on saving" when the file has inconsistent encoding and was decoded with anything but raw-text or no-conversion is unjustified. Personally, I don't think there's a bug here. It's a cockpit error.