From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Paul Michael Reilly Newsgroups: gmane.emacs.devel Subject: Re: Need some help with Rmail/mbox Date: Fri, 19 Sep 2008 01:35:12 -0400 Message-ID: <48D33A10.4040102@pajato.com> References: <87y71o4xw6.fsf@xemacs.org> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Trace: ger.gmane.org 1221802539 31112 80.91.229.12 (19 Sep 2008 05:35:39 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 19 Sep 2008 05:35:39 +0000 (UTC) Cc: emacs-devel@gnu.org To: "Stephen J. Turnbull" Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Sep 19 07:36:36 2008 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1KgYfT-0005LJ-7m for ged-emacs-devel@m.gmane.org; Fri, 19 Sep 2008 07:36:36 +0200 Original-Received: from localhost ([127.0.0.1]:33838 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KgYeQ-0007jI-So for ged-emacs-devel@m.gmane.org; Fri, 19 Sep 2008 01:35:30 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KgYeI-0007g5-5r for emacs-devel@gnu.org; Fri, 19 Sep 2008 01:35:22 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KgYeF-0007at-7P for emacs-devel@gnu.org; Fri, 19 Sep 2008 01:35:21 -0400 Original-Received: from [199.232.76.173] (port=35560 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KgYeE-0007aN-Ve for emacs-devel@gnu.org; Fri, 19 Sep 2008 01:35:18 -0400 Original-Received: from mail.pajato.com ([68.191.253.210]:44634 helo=copa.pajato.com) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1KgYeD-0005hQ-TM for emacs-devel@gnu.org; Fri, 19 Sep 2008 01:35:18 -0400 Original-Received: from [127.0.0.1] (copa.pajato.com [68.191.253.210]) by copa.pajato.com (8.14.1/8.13.8) with ESMTP id m8J5ZCUl025020; Fri, 19 Sep 2008 01:35:12 -0400 User-Agent: Thunderbird 2.0.0.16 (X11/20080723) In-Reply-To: <87y71o4xw6.fsf@xemacs.org> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (copa.pajato.com [68.191.253.210]); Fri, 19 Sep 2008 01:35:13 -0400 (EDT) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 3) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:103966 Archived-At: Stephen J. Turnbull wrote: Thanks for stepping up to this. Your help is very much appreciated! > All you should need to do for a first cut is to copy the message to a > new buffer, which doesn't need to be narrowed, but might need to have > some Babyl sentinels added. I first copy the relevant headers to the view buffer by collecting them from the PMAIL buffer into a string and insert the string into the view buffer. I used the rmail.el code pretty much as is but instead of copying and hiding I do selective copy and insert (ignoring the case of showing all headers which is trivial). Then I basically copy the message body into a string and insert it into the view buffer. But when I started to work on the decoding it seemed that decoding the string before inserting it seemed like a good idea. (Pardon my Elisp rustiness ... is it better to use buffer to buffer copying than insert string?) I copied the logic for this first part of decoding from rmail-convert-to-babyl-format. > That's basically it. You should do the processing on buffers, not > strings, though, and Are you essentially answering my question above and saying that copying buffer to buffer is faster/better than operating on strings? > > > (decode-coding-string body (detect-coding-string body t)) > > you want to parse the coding from the *header*, not guess on the body. I do parse out quoted-printable and base64 and apply these to the body before doing the coding system based decoding. > If you want you can add guessing and/or user-specified MIME charsets > as a user option, but (a) almost all genuine mail today will contain > an appropriate Content-Type charset parameter, and (b) lack of such > (unless all text is US-ASCII) is an extremely strong indicator of > spam. Few users will need to be able to read messages that have bogus > charset parameters: this feature is not immediately necessary. OK, makes sense. > The general algorithm should be something like > > Identify message in mbox buffer yup > Copy message to presentation buffer yup > Identify header and body, add Babyl sentinels if desired babyl sentinels? I'm not sure what you mean by this. > Parse headers (specifically content type) If you had said content type and content encoding I would have said "yup" and that is what led to my request for help. Except for the case of quoted-printable and base64 I'm not sure how to parse those two headers (Content-Type and Content-Transfer-Encoding) into a coding system so that I can then do the decoding. I'm assuming the coding system guesswork becomes relevant for combinations of the two headers that Rmail does not grok. And I now see that there is a strong relationship between charset and coding system. > Dispatch on content type and subtype: > Case type is text and subtype is plain > Identify charset parameter: > (or charset-from-content-type "us-ascii") > Map charset to Emacs coding-system > (decode-coding-region (body-begin) (body-end) coding-system) OK, this is helpful. I assume that for all other type/subtype cases we punt for now and use guessing or just raw text? But certainly there are some that we want to process/decode in some fashion, e.g. text/html or text/xml. Is there another Emacs package/library that you are aware of that provides a good model for where we want to take Rmail so that it handles more type/subtype cases seamlessly in the view buffer? Even perhaps audio and video (not pure MIME, i.e. multipart ... yet). > Wash header for presentation, eg: > Hide non-displayed header > Decode RFC 2047-encoded headers OK, this is helpful but I would add that non-displayed headers do not need to be in the view buffer at all. It contains all the headers or just the displayed headers, depending on the User's current desire. > Wash body for presentation, eg: > Highlight and activate url-like substrings > Highlight quoted material I don't believe Rmail does either of these operations now. Is that your understanding? If I'm right and this washing is not done, then it is very high on my priority list to add asap. If I'm wrong then please point me where it gets done or how to enable it. > Display buffer in window yup Thanks again, -pmr