From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!not-for-mail
From: "Stephen J. Turnbull" <stephen@xemacs.org>
Newsgroups: gmane.emacs.devel
Subject: Need some help with Rmail/mbox
Date: Fri, 19 Sep 2008 12:28:41 +0900
Message-ID: <87y71o4xw6.fsf@xemacs.org>
References: <lu8wtp1lys.fsf@pajato.com>
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: ger.gmane.org 1221794622 5679 80.91.229.12 (19 Sep 2008 03:23:42 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Fri, 19 Sep 2008 03:23:42 +0000 (UTC)
Cc: emacs-devel@gnu.org
To: pmr@pajato.com
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Sep 19 05:24:39 2008
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([199.232.76.165])
	by lo.gmane.org with esmtp (Exim 4.50)
	id 1KgWbm-000573-1g
	for ged-emacs-devel@m.gmane.org; Fri, 19 Sep 2008 05:24:38 +0200
Original-Received: from localhost ([127.0.0.1]:43638 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43)
	id 1KgWaj-0001DK-1E
	for ged-emacs-devel@m.gmane.org; Thu, 18 Sep 2008 23:23:33 -0400
Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1KgWad-0001Ao-Ja
	for emacs-devel@gnu.org; Thu, 18 Sep 2008 23:23:27 -0400
Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1KgWaZ-00018U-K1
	for emacs-devel@gnu.org; Thu, 18 Sep 2008 23:23:26 -0400
Original-Received: from [199.232.76.173] (port=32879 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1KgWaZ-00018A-Et
	for emacs-devel@gnu.org; Thu, 18 Sep 2008 23:23:23 -0400
Original-Received: from mtps01.sk.tsukuba.ac.jp ([130.158.97.223]:43044)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <stephen@xemacs.org>) id 1KgWaY-0006t7-4q
	for emacs-devel@gnu.org; Thu, 18 Sep 2008 23:23:23 -0400
Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp
	[130.158.99.156])
	by mtps01.sk.tsukuba.ac.jp (Postfix) with ESMTP id CE4EA1535BC;
	Fri, 19 Sep 2008 12:23:05 +0900 (JST)
Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000)
	id 6554D1A2A39; Fri, 19 Sep 2008 12:28:41 +0900 (JST)
In-Reply-To: <lu8wtp1lys.fsf@pajato.com>
X-Mailer: VM 8.0.12-devo-585 under 21.5 (beta28) "fuki" 83e35df20028+ XEmacs
	Lucid (x86_64-unknown-linux)
X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6,
	seldom 2.4 (older, 4)
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/emacs-devel>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <http://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Xref: news.gmane.org gmane.emacs.devel:103962
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/103962>

Paul Michael Reilly writes:

 > As near as I can tell the task is to decode the message body in two
 > steps:

But why not just use the existing code to do this?  AIUI, the Babyl
format was designed for one-buffer operation on a pseudo-RFC-822
message, so most functions used to wash and display probably assume
that the message is in the current buffer, which is narrowed so that
the presentation header plus the body form an RFC 2822 message.

All you should need to do for a first cut is to copy the message to a
new buffer, which doesn't need to be narrowed, but might need to have
some Babyl sentinels added.

If I'm missing something, feel free to ignore me, but I don't really
understand what all you think is different about presenting a
free-standing RFC 2822 message as opposed to presenting one that is
part of a Babyl-format buffer.  I don't think they should be that
different.  The main thing is that the Babyl format caches the set of
presentation headers in the Babyl-format file, but mbox won't.  So
you'll need to hide (or remove) the non-presentation headers
one-by-one rather than by just narrowing the buffer.

 > first to decode according to the character encoding (e.g. quoted-
 > printable or base64) and then to decode that result to some coding
 > system.

That's basically it.  You should do the processing on buffers, not
strings, though, and

 >        (decode-coding-string body (detect-coding-string body t))

you want to parse the coding from the *header*, not guess on the body.
If you want you can add guessing and/or user-specified MIME charsets
as a user option, but (a) almost all genuine mail today will contain
an appropriate Content-Type charset parameter, and (b) lack of such
(unless all text is US-ASCII) is an extremely strong indicator of
spam.  Few users will need to be able to read messages that have bogus
charset parameters: this feature is not immediately necessary.

The general algorithm should be something like

Identify message in mbox buffer
Copy message to presentation buffer
Identify header and body, add Babyl sentinels if desired
Parse headers (specifically content type)
Dispatch on content type and subtype:
    Case type is text and subtype is plain
        Identify charset parameter:
            (or charset-from-content-type "us-ascii")
        Map charset to Emacs coding-system
        (decode-coding-region (body-begin) (body-end) coding-system)
        Wash header for presentation, eg:
            Hide non-displayed header
            Decode RFC 2047-encoded headers
        Wash body for presentation, eg:
            Highlight and activate url-like substrings
            Highlight quoted material
Display buffer in window