* [resend] pre-23 Rmail bug makes unrmail unavoidably? corrupt messages
@ 2011-04-02 2:41 Mark Lillibridge
0 siblings, 0 replies; only message in thread
From: Mark Lillibridge @ 2011-04-02 2:41 UTC (permalink / raw)
To: emacs-devel
I finally had a chance to get back to my convert all my BABYL email
to mbox project. I'm still trying to get 8 bit characters to convert
properly.
I've found a new bug in pre-23 Rmail that throws away information
necessary for unrmail to work properly. :-(
As I have described on this list before, Rmail decodes new messages
without MIME charset specifications for their bodies (this includes most
multipart MIME) using undecided-unix. It is supposed to save the
decoding it used for the message using a new X-Coding-System header
line. The bug is that it saves the charset it tried to decode with, not
the charset that actually got used:
rmail.el (version 22.3.1):1899:
;; Decode the region specified by FROM and TO by CODING.
;; If CODING is nil or an invalid coding system, decode by `undecided'.
(defun rmail-decode-region (from to coding)
(if (or (not coding) (not (coding-system-p coding)))
(setq coding 'undecided))
;; Use -dos decoding, to remove ^M characters left from base64 or
;; rogue qp-encoded text.
(decode-coding-region from to
(coding-system-change-eol-conversion coding 1))
;; Don't reveal the fact we used -dos decoding, as users generally
;; will not expect the RMAIL buffer to use DOS EOL format.
(setq buffer-file-coding-system
(setq last-coding-system-used
(coding-system-change-eol-conversion coding 0))))
TomThe problem is with the last line where we use coding instead of
last-coding-system-used. If we were still patching v22, fixing this is
literally just swapping out coding for last-coding-system-used.
Thus, if I receive a message containing multipart MIME with say a
latin-1 8bit encoding attachment (the MIT mailer converts some 7bit to
8bit), Rmail will decode it correctly using iso-latin-1, but record only
that it used undecided-UNIX. This is problematic for unrmail because
the resulting code points can be encoded using both UTF-8 and Latin-1
among other encodings. unrmail encodes the message using undecided-unix
which results in using UTF-8 in my case (depends on encoding priorities
I think -- I believe I'm using the defaults), which needless to say is
wrong and leads to garbled characters in the resulting message when
interpreted as Latin-1 as specified by the MIME encoding.
In general, unrmail has to make a guess in these cases and it will
unavoidably guess wrong some of the time. I recommend at a minimum
printing a warning message in these cases -- I may make a patch for this
later -- and possibly writing a better heuristic (look for charset='s in
the body?) to minimize wrong guesses.
- Mark
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2011-04-02 2:41 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-02 2:41 [resend] pre-23 Rmail bug makes unrmail unavoidably? corrupt messages Mark Lillibridge
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.