unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Eli Zaretskii <eliz@gnu.org>
To: Jeremy Barbay <jbarbay@dcc.uchile.cl>
Cc: 17343@debbugs.gnu.org
Subject: bug#17343: 24.2; Exponential growth of files using raw-mode
Date: Fri, 25 Apr 2014 10:13:29 +0300	[thread overview]
Message-ID: <83oazpq3ty.fsf@gnu.org> (raw)
In-Reply-To: <21337.24289.430068.104422@gargle.gargle.HOWL>

> Date: Thu, 24 Apr 2014 15:58:41 -0300
> From: Jeremy Barbay <jbarbay@dcc.uchile.cl>
> 
> Following the short recipe below shows how a user saving files in "raw
> mode" could end up with files doubling their size each time saved, if
> following emacs' suggestion to save it in raw mode:
> 
> * Recipe:
> 
>   1. Save the following line in a file "testAccentsMinimal.txt"
> 
>   Nà¥\206à¤\206\206à¥\206
> 
>   2. Repeatedly, 
> 
>      0) measure the size of the file (wc -c testAccentsMinimal.txt); 
>      1) open emacs loading the file (emacs -q testAccentsMinimal.txt);
>      2) insert and delete a character in it (manually);
>      3) save it selecting the suggested raw encoding (manually);
>      4) quit emacs (or force the reload of the file).
> 
> * Result:
> 
>   This should give something akin to the following, where one can see
>   the size of the file growing exponentially with the number of savings.
> 
>   >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
>   11 testAccentsMinimal.txt
>   >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
>   19 testAccentsMinimal.txt
>   >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
>   35 testAccentsMinimal.txt
>   >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
>   67 testAccentsMinimal.txt
>   >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
>   131 testAccentsMinimal.txt
>   >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
>   259 testAccentsMinimal.txt
>   >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
>   515 testAccentsMinimal.txt
>   >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
>   1027 testAccentsMinimal.txt
>   >wc -c testAccentsMinimal.txt ; emacs -Q testAccentsMinimal.txt
>   2051 testAccentsMinimal.txt
> 
> * (Tentative) Explanation:
> 
>   - Even though the file is saved in "raw" mode, it is read in another
>     mode which prefix the "special" characters with a unicode code.
>   - Due to symbols from incompatible encodings, emacs is confused about
>     which encoding to use for saving and asks the user about it.
> 
> * Why it matters:
> 
>   - The faulty sequence above occured naturally from copy pasting from
>     various webpages (containing accented characters) into the same
>     document, and was identified when some files grew too large.  -
>     Files (e.g. of notes) end up doubling in size at each edition, until
>     they fill the memory and/or hard-drive, slow down the system and
>     make Emacs complain about the size of the file.
> 
> * (Potential) Solutions:
> 
>   - when saving a file with conflicting encodings, instead of merely
>     suggesting the raw encoding, add an option to "clean" the file
>     instead of merely save it in raw mode, for instance by projecting
>     the file to an encoding by deleting all symbols which are
>     incompatible with it.
> 
> I think that I signaled this bug 1 year ago in Emacs 23 and was answered
> at the time that this would be solved by the next version (24), but it
> occured to me recently that this undesirable behavior was still there :(

It's not a bug.  When you modify a file, its size can grow, sometimes
a lot, due to a change in encoding.  This is intended behavior.

To avoid the problem in the first place, once you discover that the
file was visited with raw-text encoding, use "C-x RET r" to re-visit
the buffer in the encoding you think is correct, and then manually fix
the bad sequences.  Then the growth will not happen.





  reply	other threads:[~2014-04-25  7:13 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-24 18:58 bug#17343: 24.2; Exponential growth of files using raw-mode Jeremy Barbay
2014-04-25  7:13 ` Eli Zaretskii [this message]
2014-04-25 18:15 ` Stefan Monnier
2014-04-29  5:48 ` Jarek Czekalski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83oazpq3ty.fsf@gnu.org \
    --to=eliz@gnu.org \
    --cc=17343@debbugs.gnu.org \
    --cc=jbarbay@dcc.uchile.cl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).