From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Mark Lillibridge Newsgroups: gmane.emacs.devel Subject: [resend] pre-23 Rmail bug makes unrmail unavoidably? corrupt messages Date: Fri, 01 Apr 2011 19:41:58 -0700 Message-ID: Reply-To: mark.lillibridge@hp.com NNTP-Posting-Host: lo.gmane.org X-Trace: dough.gmane.org 1301712140 16636 80.91.229.12 (2 Apr 2011 02:42:20 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Sat, 2 Apr 2011 02:42:20 +0000 (UTC) To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sat Apr 02 04:42:17 2011 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Q5qn0-0005Yf-PF for ged-emacs-devel@m.gmane.org; Sat, 02 Apr 2011 04:42:15 +0200 Original-Received: from localhost ([127.0.0.1]:37284 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Q5qn0-00008w-1F for ged-emacs-devel@m.gmane.org; Fri, 01 Apr 2011 22:42:14 -0400 Original-Received: from [140.186.70.92] (port=38991 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Q5qmu-00008q-7Q for emacs-devel@gnu.org; Fri, 01 Apr 2011 22:42:09 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Q5qmt-0002od-1W for emacs-devel@gnu.org; Fri, 01 Apr 2011 22:42:08 -0400 Original-Received: from gundega.hpl.hp.com ([192.6.19.190]:50576) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Q5qms-0002oY-HF for emacs-devel@gnu.org; Fri, 01 Apr 2011 22:42:06 -0400 Original-Received: from mailhub-pa1.hpl.hp.com (mailhub-pa1.hpl.hp.com [15.25.115.25]) by gundega.hpl.hp.com (8.14.4/8.14.4/HPL-PA Relay) with ESMTP id p322g12T007921 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Fri, 1 Apr 2011 19:42:02 -0700 Original-Received: from ts-rhel5 (ts-rhel5.hpl.hp.com [15.25.118.27]) by mailhub-pa1.hpl.hp.com (8.14.3/8.14.3/HPL-PA Hub) with ESMTP id p322fw7I019613; Fri, 1 Apr 2011 19:41:58 -0700 X-Scanned-By: MIMEDefang 2.71 on 15.0.48.190 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 192.6.19.190 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:138050 Archived-At: I finally had a chance to get back to my convert all my BABYL email to mbox project. I'm still trying to get 8 bit characters to convert properly. I've found a new bug in pre-23 Rmail that throws away information necessary for unrmail to work properly. :-( As I have described on this list before, Rmail decodes new messages without MIME charset specifications for their bodies (this includes most multipart MIME) using undecided-unix. It is supposed to save the decoding it used for the message using a new X-Coding-System header line. The bug is that it saves the charset it tried to decode with, not the charset that actually got used: rmail.el (version 22.3.1):1899: ;; Decode the region specified by FROM and TO by CODING. ;; If CODING is nil or an invalid coding system, decode by `undecided'. (defun rmail-decode-region (from to coding) (if (or (not coding) (not (coding-system-p coding))) (setq coding 'undecided)) ;; Use -dos decoding, to remove ^M characters left from base64 or ;; rogue qp-encoded text. (decode-coding-region from to (coding-system-change-eol-conversion coding 1)) ;; Don't reveal the fact we used -dos decoding, as users generally ;; will not expect the RMAIL buffer to use DOS EOL format. (setq buffer-file-coding-system (setq last-coding-system-used (coding-system-change-eol-conversion coding 0)))) TomThe problem is with the last line where we use coding instead of last-coding-system-used. If we were still patching v22, fixing this is literally just swapping out coding for last-coding-system-used. Thus, if I receive a message containing multipart MIME with say a latin-1 8bit encoding attachment (the MIT mailer converts some 7bit to 8bit), Rmail will decode it correctly using iso-latin-1, but record only that it used undecided-UNIX. This is problematic for unrmail because the resulting code points can be encoded using both UTF-8 and Latin-1 among other encodings. unrmail encodes the message using undecided-unix which results in using UTF-8 in my case (depends on encoding priorities I think -- I believe I'm using the defaults), which needless to say is wrong and leads to garbled characters in the resulting message when interpreted as Latin-1 as specified by the MIME encoding. In general, unrmail has to make a guess in these cases and it will unavoidably guess wrong some of the time. I recommend at a minimum printing a warning message in these cases -- I may make a patch for this later -- and possibly writing a better heuristic (look for charset='s in the body?) to minimize wrong guesses. - Mark