unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: "Stephen J. Turnbull" <stephen@xemacs.org>
To: pmr@pajato.com
Cc: emacs-devel@gnu.org
Subject: Need some help with Rmail/mbox
Date: Fri, 19 Sep 2008 12:28:41 +0900	[thread overview]
Message-ID: <87y71o4xw6.fsf@xemacs.org> (raw)
In-Reply-To: <lu8wtp1lys.fsf@pajato.com>

Paul Michael Reilly writes:

 > As near as I can tell the task is to decode the message body in two
 > steps:

But why not just use the existing code to do this?  AIUI, the Babyl
format was designed for one-buffer operation on a pseudo-RFC-822
message, so most functions used to wash and display probably assume
that the message is in the current buffer, which is narrowed so that
the presentation header plus the body form an RFC 2822 message.

All you should need to do for a first cut is to copy the message to a
new buffer, which doesn't need to be narrowed, but might need to have
some Babyl sentinels added.

If I'm missing something, feel free to ignore me, but I don't really
understand what all you think is different about presenting a
free-standing RFC 2822 message as opposed to presenting one that is
part of a Babyl-format buffer.  I don't think they should be that
different.  The main thing is that the Babyl format caches the set of
presentation headers in the Babyl-format file, but mbox won't.  So
you'll need to hide (or remove) the non-presentation headers
one-by-one rather than by just narrowing the buffer.

 > first to decode according to the character encoding (e.g. quoted-
 > printable or base64) and then to decode that result to some coding
 > system.

That's basically it.  You should do the processing on buffers, not
strings, though, and

 >        (decode-coding-string body (detect-coding-string body t))

you want to parse the coding from the *header*, not guess on the body.
If you want you can add guessing and/or user-specified MIME charsets
as a user option, but (a) almost all genuine mail today will contain
an appropriate Content-Type charset parameter, and (b) lack of such
(unless all text is US-ASCII) is an extremely strong indicator of
spam.  Few users will need to be able to read messages that have bogus
charset parameters: this feature is not immediately necessary.

The general algorithm should be something like

Identify message in mbox buffer
Copy message to presentation buffer
Identify header and body, add Babyl sentinels if desired
Parse headers (specifically content type)
Dispatch on content type and subtype:
    Case type is text and subtype is plain
        Identify charset parameter:
            (or charset-from-content-type "us-ascii")
        Map charset to Emacs coding-system
        (decode-coding-region (body-begin) (body-end) coding-system)
        Wash header for presentation, eg:
            Hide non-displayed header
            Decode RFC 2047-encoded headers
        Wash body for presentation, eg:
            Highlight and activate url-like substrings
            Highlight quoted material
Display buffer in window





  reply	other threads:[~2008-09-19  3:28 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-18 16:02 Need some help with Rmail/mbox Paul Michael Reilly
2008-09-19  3:28 ` Stephen J. Turnbull [this message]
2008-09-19  5:35   ` Paul Michael Reilly
2008-09-19  9:32     ` Eli Zaretskii
2008-09-20  7:12     ` Stephen J. Turnbull
2008-09-20 10:04       ` Daiki Ueno
2008-09-20 10:19         ` Eli Zaretskii
2008-09-20 10:46           ` Daiki Ueno
2008-09-20 11:30             ` Eli Zaretskii
2008-09-20 23:33               ` Richard M. Stallman
2008-09-21  3:18                 ` Eli Zaretskii
2008-09-21 13:34               ` Stefan Monnier
2008-09-21 17:59                 ` Eli Zaretskii
2008-09-21 19:26                   ` Stefan Monnier
2008-09-21 20:56                     ` Eli Zaretskii
2008-09-21 22:07                       ` Stefan Monnier
2008-09-22  3:07                         ` Eli Zaretskii
2008-09-22  3:36                           ` Stefan Monnier
2008-09-22  3:41                           ` Daiki Ueno
2008-09-22  3:58                             ` Stefan Monnier
2008-09-22 18:48                               ` Eli Zaretskii
2008-09-22  4:31                         ` Kenichi Handa
2008-09-22 14:10                           ` Stefan Monnier
2008-09-24  0:56                             ` Kenichi Handa
2008-09-24  2:53                               ` Stefan Monnier
2008-09-24  3:48                                 ` Kenichi Handa
2008-09-22 15:24                           ` Paul Michael Reilly
2008-09-20 13:48         ` Stephen J. Turnbull
2008-09-21  0:57           ` Daiki Ueno
2008-09-22  9:14             ` Stephen J. Turnbull
2008-09-19  4:30 ` Richard M. Stallman
2008-09-19  4:30 ` Richard M. Stallman
2008-09-19  9:12 ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y71o4xw6.fsf@xemacs.org \
    --to=stephen@xemacs.org \
    --cc=emacs-devel@gnu.org \
    --cc=pmr@pajato.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).