From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 764C9431FB6 for ; Fri, 9 Mar 2012 08:20:06 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: 0 X-Spam-Level: X-Spam-Status: No, score=0 tagged_above=-999 required=5 tests=[none] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cyu-4A2-tyRv for ; Fri, 9 Mar 2012 08:20:05 -0800 (PST) Received: from che.mayfirst.org (che.mayfirst.org [209.234.253.108]) by olra.theworths.org (Postfix) with ESMTP id CBEE2431FAE for ; Fri, 9 Mar 2012 08:20:05 -0800 (PST) Received: from [192.168.23.207] (dsl254-070-154.nyc1.dsl.speakeasy.net [216.254.70.154]) by che.mayfirst.org (Postfix) with ESMTPSA id 102AFF970; Fri, 9 Mar 2012 11:20:01 -0500 (EST) Message-ID: <4F5A2DB1.7040301@fifthhorseman.net> Date: Fri, 09 Mar 2012 11:20:01 -0500 From: Daniel Kahn Gillmor User-Agent: Mozilla/5.0 (X11; Linux i686; rv:9.0) Gecko/20120125 Icedove/9.0.1 MIME-Version: 1.0 To: David Bremner , notmuch Subject: Re: [PATCH] mime_node_open: skip envelope from lines at the start of messages References: <1331301387-23630-1-git-send-email-david@tethera.net> In-Reply-To: <1331301387-23630-1-git-send-email-david@tethera.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Mar 2012 16:20:06 -0000 On 03/09/2012 08:56 AM, David Bremner wrote: > Some MDAs such as procmail (in MH mode), and exim (doing local > delivery in some configurations of the appendfile transport) add a > line to the front of a message with "From " followed by envelope > sender. Since this is not a proper RFC822 header field, gmime (at > least since version 2.6) refuses to parse it, unless in mbox mode. > > This change reads the line of the file, and if they start with > "From ", pass the stream to gmime starting from the second line. > > This makes mime_node_open more consistent with (but still stricter > than) the permissive behaviour of notmuch_file_get_header > (message-file.c), which allows a certain number of "broken_headers". > > We avoid putting gmime into mbox mode in case of side effects; this > leaves the situation of mboxes accidentally indexed by notmuch the > same as before, namely "undefined behaviour". Ideally they should at > least be warned by notmuch-new. Although strict rfc822 adherence > would be one way to detect mboxes, it doesn't seem to fit with the > spirit or code of message-file.c. The above justification (and the version of the associated patch without the memory leak and using strncmp instead of strcmp) seems good to me. While I'd prefer to have nothing but spic-and-span, perfectly clean RFC2822 messages, we have (perhaps accidentally) traditionally supported message files with leading "From " lines, so they will be already-indexed by previous versions of notmuch. This patch defines the non-MIME variance we're willing to accept quite narrowly (just a single leading line that starts with "From ", no escaping of the rest of the text), avoids breaking compatibility with existing indexes, and satisfies indexing some plausible MTA delivery configurations. The only way it would be better is if it were to auto-detect that a file is actually a multi-message mbox, and alert the user to the fact that all but the first message in the mbox is unindexed. But we don't currently do that anyway, so it's not a regression (and that additional cleanup should probably be a separate patch anyway). so: LGTM. --dkg