From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: Paul Michael Reilly <pmr@pajato.com>
Newsgroups: gmane.emacs.devel
Subject: Re: Need some help with Rmail/mbox
Date: Fri, 19 Sep 2008 01:35:12 -0400
Message-ID: <48D33A10.4040102@pajato.com>
References: <lu8wtp1lys.fsf@pajato.com> <87y71o4xw6.fsf@xemacs.org>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: ger.gmane.org 1221802539 31112 80.91.229.12 (19 Sep 2008 05:35:39 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Fri, 19 Sep 2008 05:35:39 +0000 (UTC)
Cc: emacs-devel@gnu.org
To: "Stephen J. Turnbull" <stephen@xemacs.org>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Sep 19 07:36:36 2008
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.50)
	id 1KgYfT-0005LJ-7m
	for ged-emacs-devel@m.gmane.org; Fri, 19 Sep 2008 07:36:36 +0200
Original-Received: from localhost ([127.0.0.1]:33838 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1KgYeQ-0007jI-So
	for ged-emacs-devel@m.gmane.org; Fri, 19 Sep 2008 01:35:30 -0400
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1KgYeI-0007g5-5r
	for emacs-devel@gnu.org; Fri, 19 Sep 2008 01:35:22 -0400
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1KgYeF-0007at-7P
	for emacs-devel@gnu.org; Fri, 19 Sep 2008 01:35:21 -0400
Original-Received: from [199.232.76.173] (port=35560 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1KgYeE-0007aN-Ve
	for emacs-devel@gnu.org; Fri, 19 Sep 2008 01:35:18 -0400
Original-Received: from mail.pajato.com ([68.191.253.210]:44634 helo=copa.pajato.com)
	by monty-python.gnu.org with esmtps
	(TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60)
	(envelope-from <pmr@pajato.com>) id 1KgYeD-0005hQ-TM
	for emacs-devel@gnu.org; Fri, 19 Sep 2008 01:35:18 -0400
Original-Received: from [127.0.0.1] (copa.pajato.com [68.191.253.210])
	by copa.pajato.com (8.14.1/8.13.8) with ESMTP id m8J5ZCUl025020;
	Fri, 19 Sep 2008 01:35:12 -0400
User-Agent: Thunderbird 2.0.0.16 (X11/20080723)
In-Reply-To: <87y71o4xw6.fsf@xemacs.org>
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0
	(copa.pajato.com [68.191.253.210]);
	Fri, 19 Sep 2008 01:35:13 -0400 (EDT)
X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 3)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:103966
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/103966>

Stephen J. Turnbull wrote:

Thanks for stepping up to this.  Your help is very much appreciated!

> All you should need to do for a first cut is to copy the message to a
> new buffer, which doesn't need to be narrowed, but might need to have
> some Babyl sentinels added.

I first copy the relevant headers to the view buffer by collecting
them from the PMAIL buffer into a string and insert the string into
the view buffer.  I used the rmail.el code pretty much as is but
instead of copying and hiding I do selective copy and insert (ignoring
the case of showing all headers which is trivial).

Then I basically copy the message body into a string and insert it
into the view buffer.  But when I started to work on the decoding it
seemed that decoding the string before inserting it seemed like a good
idea. (Pardon my Elisp rustiness ... is it better to use buffer to
buffer copying than insert string?) I copied the logic for this first
part of decoding from rmail-convert-to-babyl-format.

> That's basically it.  You should do the processing on buffers, not
> strings, though, and

Are you essentially answering my question above and saying that
copying buffer to buffer is faster/better than operating on strings?

> 
>  >        (decode-coding-string body (detect-coding-string body t))
> 
> you want to parse the coding from the *header*, not guess on the body.

I do parse out quoted-printable and base64 and apply these to the body
before doing the coding system based decoding.

> If you want you can add guessing and/or user-specified MIME charsets
> as a user option, but (a) almost all genuine mail today will contain
> an appropriate Content-Type charset parameter, and (b) lack of such
> (unless all text is US-ASCII) is an extremely strong indicator of
> spam.  Few users will need to be able to read messages that have bogus
> charset parameters: this feature is not immediately necessary.

OK, makes sense.

> The general algorithm should be something like
> 
> Identify message in mbox buffer

yup

> Copy message to presentation buffer

yup

> Identify header and body, add Babyl sentinels if desired

babyl sentinels?  I'm not sure what you mean by this.

> Parse headers (specifically content type)

If you had said content type and content encoding I would have said
"yup" and that is what led to my request for help.  Except for the
case of quoted-printable and base64 I'm not sure how to parse those
two headers (Content-Type and Content-Transfer-Encoding) into a coding
system so that I can then do the decoding.  I'm assuming the coding
system guesswork becomes relevant for combinations of the two headers
that Rmail does not grok.  And I now see that there is a strong
relationship between charset and coding system.

> Dispatch on content type and subtype:
>     Case type is text and subtype is plain
>         Identify charset parameter:
>             (or charset-from-content-type "us-ascii")
>         Map charset to Emacs coding-system
>         (decode-coding-region (body-begin) (body-end) coding-system)

OK, this is helpful.  I assume that for all other type/subtype cases
we punt for now and use guessing or just raw text?  But certainly
there are some that we want to process/decode in some fashion,
e.g. text/html or text/xml.  Is there another Emacs package/library
that you are aware of that provides a good model for where we want to
take Rmail so that it handles more type/subtype cases seamlessly in
the view buffer? Even perhaps audio and video (not pure MIME,
i.e. multipart ... yet).

>         Wash header for presentation, eg:
>             Hide non-displayed header
>             Decode RFC 2047-encoded headers

OK, this is helpful but I would add that non-displayed headers do not
need to be in the view buffer at all.  It contains all the headers or
just the displayed headers, depending on the User's current desire.

>         Wash body for presentation, eg:
>             Highlight and activate url-like substrings
>             Highlight quoted material

I don't believe Rmail does either of these operations now.  Is that
your understanding?  If I'm right and this washing is not done, then
it is very high on my priority list to add asap.  If I'm wrong then
please point me where it gets done or how to enable it.

> Display buffer in window

yup

Thanks again,

-pmr