From: Eli Zaretskii <eliz@gnu.org>
To: Jeremy Barbay <jbarbay@dcc.uchile.cl>
Cc: 17343@debbugs.gnu.org
Subject: bug#17343: 24.2; Exponential growth of files using raw-mode
Date: Fri, 25 Apr 2014 10:13:29 +0300 [thread overview]
Message-ID: <83oazpq3ty.fsf@gnu.org> (raw)
In-Reply-To: <21337.24289.430068.104422@gargle.gargle.HOWL>
> Date: Thu, 24 Apr 2014 15:58:41 -0300
> From: Jeremy Barbay <jbarbay@dcc.uchile.cl>
>
> Following the short recipe below shows how a user saving files in "raw
> mode" could end up with files doubling their size each time saved, if
> following emacs' suggestion to save it in raw mode:
>
> * Recipe:
>
> 1. Save the following line in a file "testAccentsMinimal.txt"
>
> Nà¥\206à¤\206\206à¥\206
>
> 2. Repeatedly,
>
> 0) measure the size of the file (wc -c testAccentsMinimal.txt);
> 1) open emacs loading the file (emacs -q testAccentsMinimal.txt);
> 2) insert and delete a character in it (manually);
> 3) save it selecting the suggested raw encoding (manually);
> 4) quit emacs (or force the reload of the file).
>
> * Result:
>
> This should give something akin to the following, where one can see
> the size of the file growing exponentially with the number of savings.
>
> >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
> 11 testAccentsMinimal.txt
> >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
> 19 testAccentsMinimal.txt
> >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
> 35 testAccentsMinimal.txt
> >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
> 67 testAccentsMinimal.txt
> >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
> 131 testAccentsMinimal.txt
> >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
> 259 testAccentsMinimal.txt
> >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
> 515 testAccentsMinimal.txt
> >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
> 1027 testAccentsMinimal.txt
> >wc -c testAccentsMinimal.txt ; emacs -Q testAccentsMinimal.txt
> 2051 testAccentsMinimal.txt
>
> * (Tentative) Explanation:
>
> - Even though the file is saved in "raw" mode, it is read in another
> mode which prefix the "special" characters with a unicode code.
> - Due to symbols from incompatible encodings, emacs is confused about
> which encoding to use for saving and asks the user about it.
>
> * Why it matters:
>
> - The faulty sequence above occured naturally from copy pasting from
> various webpages (containing accented characters) into the same
> document, and was identified when some files grew too large. -
> Files (e.g. of notes) end up doubling in size at each edition, until
> they fill the memory and/or hard-drive, slow down the system and
> make Emacs complain about the size of the file.
>
> * (Potential) Solutions:
>
> - when saving a file with conflicting encodings, instead of merely
> suggesting the raw encoding, add an option to "clean" the file
> instead of merely save it in raw mode, for instance by projecting
> the file to an encoding by deleting all symbols which are
> incompatible with it.
>
> I think that I signaled this bug 1 year ago in Emacs 23 and was answered
> at the time that this would be solved by the next version (24), but it
> occured to me recently that this undesirable behavior was still there :(
It's not a bug. When you modify a file, its size can grow, sometimes
a lot, due to a change in encoding. This is intended behavior.
To avoid the problem in the first place, once you discover that the
file was visited with raw-text encoding, use "C-x RET r" to re-visit
the buffer in the encoding you think is correct, and then manually fix
the bad sequences. Then the growth will not happen.
next prev parent reply other threads:[~2014-04-25 7:13 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-24 18:58 bug#17343: 24.2; Exponential growth of files using raw-mode Jeremy Barbay
2014-04-25 7:13 ` Eli Zaretskii [this message]
2014-04-25 18:15 ` Stefan Monnier
2014-04-29 5:48 ` Jarek Czekalski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=83oazpq3ty.fsf@gnu.org \
--to=eliz@gnu.org \
--cc=17343@debbugs.gnu.org \
--cc=jbarbay@dcc.uchile.cl \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).