all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* [resend] pre-23 Rmail bug makes unrmail unavoidably? corrupt messages
@ 2011-04-02  2:41 Mark Lillibridge
  0 siblings, 0 replies; only message in thread
From: Mark Lillibridge @ 2011-04-02  2:41 UTC (permalink / raw)
  To: emacs-devel


    I finally had a chance to get back to my convert all my BABYL email
to mbox project.  I'm still trying to get 8 bit characters to convert
properly.


    I've found a new bug in pre-23 Rmail that throws away information
necessary for unrmail to work properly.  :-( 

    As I have described on this list before, Rmail decodes new messages
without MIME charset specifications for their bodies (this includes most
multipart MIME) using undecided-unix.  It is supposed to save the
decoding it used for the message using a new X-Coding-System header
line.  The bug is that it saves the charset it tried to decode with, not
the charset that actually got used:

rmail.el (version 22.3.1):1899:
;; Decode the region specified by FROM and TO by CODING.
;; If CODING is nil or an invalid coding system, decode by `undecided'.
(defun rmail-decode-region (from to coding)
  (if (or (not coding) (not (coding-system-p coding)))
      (setq coding 'undecided))
  ;; Use -dos decoding, to remove ^M characters left from base64 or
  ;; rogue qp-encoded text.
  (decode-coding-region from to
			(coding-system-change-eol-conversion coding 1))
  ;; Don't reveal the fact we used -dos decoding, as users generally
  ;; will not expect the RMAIL buffer to use DOS EOL format.
  (setq buffer-file-coding-system
	(setq last-coding-system-used
	      (coding-system-change-eol-conversion coding 0))))

TomThe problem is with the last line where we use coding instead of
last-coding-system-used.  If we were still patching v22, fixing this is
literally just swapping out coding for last-coding-system-used.


    Thus, if I receive a message containing multipart MIME with say a
latin-1 8bit encoding attachment (the MIT mailer converts some 7bit to
8bit), Rmail will decode it correctly using iso-latin-1, but record only
that it used undecided-UNIX.  This is problematic for unrmail because
the resulting code points can be encoded using both UTF-8 and Latin-1
among other encodings.  unrmail encodes the message using undecided-unix
which results in using UTF-8 in my case (depends on encoding priorities
I think -- I believe I'm using the defaults), which needless to say is
wrong and leads to garbled characters in the resulting message when
interpreted as Latin-1 as specified by the MIME encoding.


    In general, unrmail has to make a guess in these cases and it will
unavoidably guess wrong some of the time.  I recommend at a minimum
printing a warning message in these cases -- I may make a patch for this
later -- and possibly writing a better heuristic (look for charset='s in
the body?) to minimize wrong guesses.

- Mark



^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2011-04-02  2:41 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-02  2:41 [resend] pre-23 Rmail bug makes unrmail unavoidably? corrupt messages Mark Lillibridge

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.