From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Jeremy Barbay Newsgroups: gmane.emacs.bugs Subject: bug#17343: 24.2; Exponential growth of files using raw-mode Date: Thu, 24 Apr 2014 15:58:41 -0300 Message-ID: <21337.24289.430068.104422@gargle.gargle.HOWL> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: ger.gmane.org 1398397895 12906 80.91.229.3 (25 Apr 2014 03:51:35 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 25 Apr 2014 03:51:35 +0000 (UTC) To: 17343@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Fri Apr 25 05:51:24 2014 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WdXAa-0004pn-3r for geb-bug-gnu-emacs@m.gmane.org; Fri, 25 Apr 2014 05:51:24 +0200 Original-Received: from localhost ([::1]:55194 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WdXAZ-00040n-6M for geb-bug-gnu-emacs@m.gmane.org; Thu, 24 Apr 2014 23:51:23 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:37739) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WdXAP-0003o8-1k for bug-gnu-emacs@gnu.org; Thu, 24 Apr 2014 23:51:19 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WdXAG-00081s-7g for bug-gnu-emacs@gnu.org; Thu, 24 Apr 2014 23:51:12 -0400 Original-Received: from debbugs.gnu.org ([140.186.70.43]:49598) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WdXAG-00081k-54 for bug-gnu-emacs@gnu.org; Thu, 24 Apr 2014 23:51:04 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.80) (envelope-from ) id 1WdXAF-0000tM-J5 for bug-gnu-emacs@gnu.org; Thu, 24 Apr 2014 23:51:03 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Jeremy Barbay Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 25 Apr 2014 03:51:03 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 17343 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.13983978233345 (code B ref -1); Fri, 25 Apr 2014 03:51:03 +0000 Original-Received: (at submit) by debbugs.gnu.org; 25 Apr 2014 03:50:23 +0000 Original-Received: from localhost ([127.0.0.1]:57753 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WdX9a-0000rn-38 for submit@debbugs.gnu.org; Thu, 24 Apr 2014 23:50:23 -0400 Original-Received: from eggs.gnu.org ([208.118.235.92]:54418) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WdV72-0004Ga-HV for submit@debbugs.gnu.org; Thu, 24 Apr 2014 21:39:37 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WdV6w-00010F-EI for submit@debbugs.gnu.org; Thu, 24 Apr 2014 21:39:36 -0400 Original-Received: from lists.gnu.org ([2001:4830:134:3::11]:38220) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WdV6w-000102-BT for submit@debbugs.gnu.org; Thu, 24 Apr 2014 21:39:30 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:35451) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WdUYz-0000fQ-SK for bug-gnu-emacs@gnu.org; Thu, 24 Apr 2014 21:05:03 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WdP7y-0008Az-MA for bug-gnu-emacs@gnu.org; Thu, 24 Apr 2014 15:16:16 -0400 Original-Received: from sunsite.dcc.uchile.cl ([192.80.24.2]:42772) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WdP7y-00086R-8m for bug-gnu-emacs@gnu.org; Thu, 24 Apr 2014 15:16:10 -0400 Original-Received: from dichato.dcc.uchile.cl (dichato.dcc.uchile.cl [172.17.68.37]) by sunsite.dcc.uchile.cl (8.14.4/8.14.4/Debian-4) with ESMTP id s3OIwhis013970 for ; Thu, 24 Apr 2014 15:58:44 -0300 Original-Received: from raven (pc-113-237-73-200.cm.vtr.net [200.73.237.113]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by dichato.dcc.uchile.cl (Postfix) with ESMTPSA id 3AA4A28002D for ; Thu, 24 Apr 2014 15:58:43 -0300 (CLST) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.3.9 (sunsite.dcc.uchile.cl [172.17.68.57]); Thu, 24 Apr 2014 15:58:44 -0300 (CLST) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Mailman-Approved-At: Thu, 24 Apr 2014 23:50:16 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 140.186.70.43 X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.bugs:88282 Archived-At: Hi. Following the short recipe below shows how a user saving files in "raw mode" could end up with files doubling their size each time saved, if following emacs' suggestion to save it in raw mode: * Recipe: 1. Save the following line in a file "testAccentsMinimal.txt" N=C3=A0=C2=A5=86=C3=A0=C2=A4=86=86=C3=A0=C2=A5=86 2. Repeatedly,=20 0) measure the size of the file (wc -c testAccentsMinimal.txt);=20= 1) open emacs loading the file (emacs -q testAccentsMinimal.txt); 2) insert and delete a character in it (manually); 3) save it selecting the suggested raw encoding (manually); 4) quit emacs (or force the reload of the file). * Result: This should give something akin to the following, where one can see the size of the file growing exponentially with the number of savings= =2E >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt 11 testAccentsMinimal.txt >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt 19 testAccentsMinimal.txt >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt 35 testAccentsMinimal.txt >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt 67 testAccentsMinimal.txt >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt 131 testAccentsMinimal.txt >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt 259 testAccentsMinimal.txt >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt 515 testAccentsMinimal.txt >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt 1027 testAccentsMinimal.txt >wc -c testAccentsMinimal.txt ; emacs -Q testAccentsMinimal.txt 2051 testAccentsMinimal.txt * (Tentative) Explanation: - Even though the file is saved in "raw" mode, it is read in another mode which prefix the "special" characters with a unicode code. - Due to symbols from incompatible encodings, emacs is confused about= which encoding to use for saving and asks the user about it. * Why it matters: - The faulty sequence above occured naturally from copy pasting from various webpages (containing accented characters) into the same document, and was identified when some files grew too large. - Files (e.g. of notes) end up doubling in size at each edition, unti= l they fill the memory and/or hard-drive, slow down the system and make Emacs complain about the size of the file. * (Potential) Solutions: - when saving a file with conflicting encodings, instead of merely suggesting the raw encoding, add an option to "clean" the file instead of merely save it in raw mode, for instance by projecting the file to an encoding by deleting all symbols which are incompatible with it. I think that I signaled this bug 1 year ago in Emacs 23 and was answere= d at the time that this would be solved by the next version (24), but it occured to me recently that this undesirable behavior was still there := ( I hope it helps. =20 --=20 Jeremy (http://www.dcc.uchile.cl/~jbarbay) In GNU Emacs 24.2.1 (x86=5F64-unknown-linux-gnu, X toolkit, Xaw scroll = bars) of 2013-02-27 on raven Windowing system distributor `The X.Org Foundation', version 11.0.11300= 000 Important settings: value of $LC=5FALL: nil value of $LC=5FCOLLATE: nil value of $LC=5FCTYPE: nil value of $LC=5FMESSAGES: nil value of $LC=5FMONETARY: en=5FUS.UTF-8 value of $LC=5FNUMERIC: en=5FUS.UTF-8 value of $LC=5FTIME: en=5FUS.UTF-8 value of $LANG: en=5FUS.UTF-8 value of $XMODIFIERS: nil locale-coding-system: utf-8-unix default enable-multibyte-characters: t Major mode: Text Minor modes in effect: tooltip-mode: t mouse-wheel-mode: t tool-bar-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t blink-cursor-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t line-number-mode: t transient-mark-mode: t Recent input: =20 =20 Recent messages: For information about GNU Emacs and the GNU system, type C-h C-a. Loading vc-git...done Scanning for dabbrevs...done dabbrev-expand: No dynamic expansion for `Expo' found Load-path shadows: None found. Features: (shadow sort gnus-util mail-extr dabbrev emacsbug message format-spec rfc822 mml easymenu mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils vc-git ind-util regexp-opt time-date tooltip ediff-hook vc-hooks lisp-float-type mwheel x-win x-dn= d tool-bar dnd fontset image fringe lisp-mode register page menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock font-lock syntax facemenu font-core frame cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer loaddefs button faces cus-face files text-properties overlay sha1 md5 base64 format env code-pages mule custom widget hashtable-print-readable backquote make-network-process dbusbind dynamic-setting system-font-setting font-render-setting x-toolkit x multi-tty emacs)