bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#17343: 24.2; Exponential growth of files using raw-mode


From: Eli Zaretskii
Subject: bug#17343: 24.2; Exponential growth of files using raw-mode
Date: Fri, 25 Apr 2014 10:13:29 +0300

> Date: Thu, 24 Apr 2014 15:58:41 -0300
> From: Jeremy Barbay <jbarbay@dcc.uchile.cl>
> 
> Following the short recipe below shows how a user saving files in "raw
> mode" could end up with files doubling their size each time saved, if
> following emacs' suggestion to save it in raw mode:
> 
> * Recipe:
> 
>   1. Save the following line in a file "testAccentsMinimal.txt"
> 
>   Nà¥\206à¤\206\206à¥\206
> 
>   2. Repeatedly, 
> 
>      0) measure the size of the file (wc -c testAccentsMinimal.txt); 
>      1) open emacs loading the file (emacs -q testAccentsMinimal.txt);
>      2) insert and delete a character in it (manually);
>      3) save it selecting the suggested raw encoding (manually);
>      4) quit emacs (or force the reload of the file).
> 
> * Result:
> 
>   This should give something akin to the following, where one can see
>   the size of the file growing exponentially with the number of savings.
> 
>   >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
>   11 testAccentsMinimal.txt
>   >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
>   19 testAccentsMinimal.txt
>   >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
>   35 testAccentsMinimal.txt
>   >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
>   67 testAccentsMinimal.txt
>   >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
>   131 testAccentsMinimal.txt
>   >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
>   259 testAccentsMinimal.txt
>   >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
>   515 testAccentsMinimal.txt
>   >wc -c testAccentsMinimal.txt ; emacs -q testAccentsMinimal.txt
>   1027 testAccentsMinimal.txt
>   >wc -c testAccentsMinimal.txt ; emacs -Q testAccentsMinimal.txt
>   2051 testAccentsMinimal.txt
> 
> * (Tentative) Explanation:
> 
>   - Even though the file is saved in "raw" mode, it is read in another
>     mode which prefix the "special" characters with a unicode code.
>   - Due to symbols from incompatible encodings, emacs is confused about
>     which encoding to use for saving and asks the user about it.
> 
> * Why it matters:
> 
>   - The faulty sequence above occured naturally from copy pasting from
>     various webpages (containing accented characters) into the same
>     document, and was identified when some files grew too large.  -
>     Files (e.g. of notes) end up doubling in size at each edition, until
>     they fill the memory and/or hard-drive, slow down the system and
>     make Emacs complain about the size of the file.
> 
> * (Potential) Solutions:
> 
>   - when saving a file with conflicting encodings, instead of merely
>     suggesting the raw encoding, add an option to "clean" the file
>     instead of merely save it in raw mode, for instance by projecting
>     the file to an encoding by deleting all symbols which are
>     incompatible with it.
> 
> I think that I signaled this bug 1 year ago in Emacs 23 and was answered
> at the time that this would be solved by the next version (24), but it
> occured to me recently that this undesirable behavior was still there :(

It's not a bug.  When you modify a file, its size can grow, sometimes
a lot, due to a change in encoding.  This is intended behavior.

To avoid the problem in the first place, once you discover that the
file was visited with raw-text encoding, use "C-x RET r" to re-visit
the buffer in the encoding you think is correct, and then manually fix
the bad sequences.  Then the growth will not happen.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]