From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#17343: 24.2; Exponential growth of files using raw-mode Date: Fri, 25 Apr 2014 10:13:29 +0300 Message-ID: <83oazpq3ty.fsf@gnu.org> References: <21337.24289.430068.104422@gargle.gargle.HOWL> Reply-To: Eli Zaretskii NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT X-Trace: ger.gmane.org 1398410069 30122 80.91.229.3 (25 Apr 2014 07:14:29 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 25 Apr 2014 07:14:29 +0000 (UTC) Cc: 17343@debbugs.gnu.org To: Jeremy Barbay Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Fri Apr 25 09:14:20 2014 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WdaKx-0002Co-EG for geb-bug-gnu-emacs@m.gmane.org; Fri, 25 Apr 2014 09:14:19 +0200 Original-Received: from localhost ([::1]:56350 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WdaKx-0006mC-0Q for geb-bug-gnu-emacs@m.gmane.org; Fri, 25 Apr 2014 03:14:19 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:52396) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WdaKn-0006jq-SW for bug-gnu-emacs@gnu.org; Fri, 25 Apr 2014 03:14:15 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WdaKh-0004I7-31 for bug-gnu-emacs@gnu.org; Fri, 25 Apr 2014 03:14:09 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:49729) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WdaKg-0004Hr-W6 for bug-gnu-emacs@gnu.org; Fri, 25 Apr 2014 03:14:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1WdaKg-0008Oe-AU for bug-gnu-emacs@gnu.org; Fri, 25 Apr 2014 03:14:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 25 Apr 2014 07:14:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 17343 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: Original-Received: via spool by 17343-submit@debbugs.gnu.org id=B17343.139841001532225 (code B ref 17343); Fri, 25 Apr 2014 07:14:02 +0000 Original-Received: (at 17343) by debbugs.gnu.org; 25 Apr 2014 07:13:35 +0000 Original-Received: from localhost ([127.0.0.1]:57886 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WdaKE-0008Ng-HE for submit@debbugs.gnu.org; Fri, 25 Apr 2014 03:13:35 -0400 Original-Received: from mtaout22.012.net.il ([80.179.55.172]:53964) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WdaKA-0008NQ-L2 for 17343@debbugs.gnu.org; Fri, 25 Apr 2014 03:13:31 -0400 Original-Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0N4K00500RVS5400@a-mtaout22.012.net.il> for 17343@debbugs.gnu.org; Fri, 25 Apr 2014 10:13:29 +0300 (IDT) Original-Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0N4K005ZDS2G4F10@a-mtaout22.012.net.il>; Fri, 25 Apr 2014 10:13:29 +0300 (IDT) In-reply-to: <21337.24289.430068.104422@gargle.gargle.HOWL> X-012-Sender: halo1@inter.net.il X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:88292 Archived-At: > Date: Thu, 24 Apr 2014 15:58:41 -0300 > From: Jeremy Barbay > > Following the short recipe below shows how a user saving files in "raw > mode" could end up with files doubling their size each time saved, if > following emacs' suggestion to save it in raw mode: > > * Recipe: > > 1. Save the following line in a file "testAccentsMinimal.txt" > > Nà¥\206à¤\206\206à¥\206 > > 2. Repeatedly, > > 0) measure the size of the file (wc -c testAccentsMinimal.txt); > 1) open emacs loading the file (emacs -q testAccentsMinimal.txt); > 2) insert and delete a character in it (manually); > 3) save it selecting the suggested raw encoding (manually); > 4) quit emacs (or force the reload of the file). > > * Result: > > This should give something akin to the following, where one can see > the size of the file growing exponentially with the number of savings. > > >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt > 11 testAccentsMinimal.txt > >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt > 19 testAccentsMinimal.txt > >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt > 35 testAccentsMinimal.txt > >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt > 67 testAccentsMinimal.txt > >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt > 131 testAccentsMinimal.txt > >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt > 259 testAccentsMinimal.txt > >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt > 515 testAccentsMinimal.txt > >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt > 1027 testAccentsMinimal.txt > >wc -c testAccentsMinimal.txt ; emacs -Q testAccentsMinimal.txt > 2051 testAccentsMinimal.txt > > * (Tentative) Explanation: > > - Even though the file is saved in "raw" mode, it is read in another > mode which prefix the "special" characters with a unicode code. > - Due to symbols from incompatible encodings, emacs is confused about > which encoding to use for saving and asks the user about it. > > * Why it matters: > > - The faulty sequence above occured naturally from copy pasting from > various webpages (containing accented characters) into the same > document, and was identified when some files grew too large. - > Files (e.g. of notes) end up doubling in size at each edition, until > they fill the memory and/or hard-drive, slow down the system and > make Emacs complain about the size of the file. > > * (Potential) Solutions: > > - when saving a file with conflicting encodings, instead of merely > suggesting the raw encoding, add an option to "clean" the file > instead of merely save it in raw mode, for instance by projecting > the file to an encoding by deleting all symbols which are > incompatible with it. > > I think that I signaled this bug 1 year ago in Emacs 23 and was answered > at the time that this would be solved by the next version (24), but it > occured to me recently that this undesirable behavior was still there :( It's not a bug. When you modify a file, its size can grow, sometimes a lot, due to a change in encoding. This is intended behavior. To avoid the problem in the first place, once you discover that the file was visited with raw-text encoding, use "C-x RET r" to re-visit the buffer in the encoding you think is correct, and then manually fix the bad sequences. Then the growth will not happen.