From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id 765886DE0F66 for ; Tue, 19 Nov 2019 12:18:23 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -0.061 X-Spam-Level: X-Spam-Status: No, score=-0.061 tagged_above=-999 required=5 tests=[AWL=-0.060, SPF_PASS=-0.001] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nE4W0Y6qQl8x for ; Tue, 19 Nov 2019 12:18:22 -0800 (PST) Received: from fethera.tethera.net (fethera.tethera.net [198.245.60.197]) by arlo.cworth.org (Postfix) with ESMTPS id 5EB136DE0C3B for ; Tue, 19 Nov 2019 12:18:22 -0800 (PST) Received: from remotemail by fethera.tethera.net with local (Exim 4.89) (envelope-from ) id 1iX9wy-0005CL-Jb; Tue, 19 Nov 2019 15:18:12 -0500 Received: (nullmailer pid 18644 invoked by uid 1000); Tue, 19 Nov 2019 20:18:11 -0000 From: David Bremner To: Eirik Byrkjeflot Anonsen , Alvaro Herrera , Tomi Ollila Cc: notmuch@notmuchmail.org Subject: Re: notmuch ignoring alot of emails In-Reply-To: <87blt9tdjj.fsf@eirikba.org> References: <20190701153657.GA9961@alvherre.pgsql> <87y2wfsorq.fsf@tethera.net> <87eey7szz6.fsf@eirikba.org> <87v9rismzz.fsf@tethera.net> <87k17ysjj0.fsf@tethera.net> <87blt9tdjj.fsf@eirikba.org> Date: Tue, 19 Nov 2019 16:18:11 -0400 Message-ID: <87wobvsjr0.fsf@tethera.net> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Nov 2019 20:18:23 -0000 Eirik Byrkjeflot Anonsen writes: > Then I can really only see three alternatives: > > 1. Ignore any "From " lines that aren't followed by something that looks > like it could reasonably be a mail header (as Tomi suggested). My > suspicion is that this would eliminate almost all false positives. > (Outside of mailing lists discussing mboxes, at least.) This seems more hopeful to me than relying on Content-Length. I tried (but failed) to quickly understand what GMime is doing to decide if something is an mbox, but it seems possible that Jeff S (GMime maintainer) might be receptive to something along those lines. There is a GMIME_FORMAT_MBOX, but maybe something like GMIME_FORMAT_RFC4155 could specify a stricter mbox format were each "message" should be roughly RFC2822 formatted.