From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: "Stephen J. Turnbull" Newsgroups: gmane.emacs.devel Subject: Need some help with Rmail/mbox Date: Fri, 19 Sep 2008 12:28:41 +0900 Message-ID: <87y71o4xw6.fsf@xemacs.org> References: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: ger.gmane.org 1221794622 5679 80.91.229.12 (19 Sep 2008 03:23:42 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 19 Sep 2008 03:23:42 +0000 (UTC) Cc: emacs-devel@gnu.org To: pmr@pajato.com Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Fri Sep 19 05:24:39 2008 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.50) id 1KgWbm-000573-1g for ged-emacs-devel@m.gmane.org; Fri, 19 Sep 2008 05:24:38 +0200 Original-Received: from localhost ([127.0.0.1]:43638 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KgWaj-0001DK-1E for ged-emacs-devel@m.gmane.org; Thu, 18 Sep 2008 23:23:33 -0400 Original-Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KgWad-0001Ao-Ja for emacs-devel@gnu.org; Thu, 18 Sep 2008 23:23:27 -0400 Original-Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KgWaZ-00018U-K1 for emacs-devel@gnu.org; Thu, 18 Sep 2008 23:23:26 -0400 Original-Received: from [199.232.76.173] (port=32879 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KgWaZ-00018A-Et for emacs-devel@gnu.org; Thu, 18 Sep 2008 23:23:23 -0400 Original-Received: from mtps01.sk.tsukuba.ac.jp ([130.158.97.223]:43044) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1KgWaY-0006t7-4q for emacs-devel@gnu.org; Thu, 18 Sep 2008 23:23:23 -0400 Original-Received: from uwakimon.sk.tsukuba.ac.jp (uwakimon.sk.tsukuba.ac.jp [130.158.99.156]) by mtps01.sk.tsukuba.ac.jp (Postfix) with ESMTP id CE4EA1535BC; Fri, 19 Sep 2008 12:23:05 +0900 (JST) Original-Received: by uwakimon.sk.tsukuba.ac.jp (Postfix, from userid 1000) id 6554D1A2A39; Fri, 19 Sep 2008 12:28:41 +0900 (JST) In-Reply-To: X-Mailer: VM 8.0.12-devo-585 under 21.5 (beta28) "fuki" 83e35df20028+ XEmacs Lucid (x86_64-unknown-linux) X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.emacs.devel:103962 Archived-At: Paul Michael Reilly writes: > As near as I can tell the task is to decode the message body in two > steps: But why not just use the existing code to do this? AIUI, the Babyl format was designed for one-buffer operation on a pseudo-RFC-822 message, so most functions used to wash and display probably assume that the message is in the current buffer, which is narrowed so that the presentation header plus the body form an RFC 2822 message. All you should need to do for a first cut is to copy the message to a new buffer, which doesn't need to be narrowed, but might need to have some Babyl sentinels added. If I'm missing something, feel free to ignore me, but I don't really understand what all you think is different about presenting a free-standing RFC 2822 message as opposed to presenting one that is part of a Babyl-format buffer. I don't think they should be that different. The main thing is that the Babyl format caches the set of presentation headers in the Babyl-format file, but mbox won't. So you'll need to hide (or remove) the non-presentation headers one-by-one rather than by just narrowing the buffer. > first to decode according to the character encoding (e.g. quoted- > printable or base64) and then to decode that result to some coding > system. That's basically it. You should do the processing on buffers, not strings, though, and > (decode-coding-string body (detect-coding-string body t)) you want to parse the coding from the *header*, not guess on the body. If you want you can add guessing and/or user-specified MIME charsets as a user option, but (a) almost all genuine mail today will contain an appropriate Content-Type charset parameter, and (b) lack of such (unless all text is US-ASCII) is an extremely strong indicator of spam. Few users will need to be able to read messages that have bogus charset parameters: this feature is not immediately necessary. The general algorithm should be something like Identify message in mbox buffer Copy message to presentation buffer Identify header and body, add Babyl sentinels if desired Parse headers (specifically content type) Dispatch on content type and subtype: Case type is text and subtype is plain Identify charset parameter: (or charset-from-content-type "us-ascii") Map charset to Emacs coding-system (decode-coding-region (body-begin) (body-end) coding-system) Wash header for presentation, eg: Hide non-displayed header Decode RFC 2047-encoded headers Wash body for presentation, eg: Highlight and activate url-like substrings Highlight quoted material Display buffer in window