From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id 2D33E6DE1456 for ; Sat, 29 Jun 2019 12:09:10 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -0.051 X-Spam-Level: X-Spam-Status: No, score=-0.051 tagged_above=-999 required=5 tests=[AWL=-0.050, SPF_PASS=-0.001] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PKnlxct0O2mE for ; Sat, 29 Jun 2019 12:09:09 -0700 (PDT) Received: from fethera.tethera.net (fethera.tethera.net [198.245.60.197]) by arlo.cworth.org (Postfix) with ESMTPS id 4CA906DE0F19 for ; Sat, 29 Jun 2019 12:09:09 -0700 (PDT) Received: from remotemail by fethera.tethera.net with local (Exim 4.89) (envelope-from ) id 1hhIig-0004CF-AV; Sat, 29 Jun 2019 15:09:06 -0400 Received: (nullmailer pid 27069 invoked by uid 1000); Sat, 29 Jun 2019 19:09:06 -0000 From: David Bremner To: Alvaro Herrera , Alexei Gilchrist Cc: notmuch@notmuchmail.org Subject: Re: notmuch ignoring alot of emails In-Reply-To: <87y31kdxip.fsf@tethera.net> References: <20190628201121.GA8537@alvherre.pgsql> <87y31kdxip.fsf@tethera.net> Date: Sat, 29 Jun 2019 21:09:06 +0200 Message-ID: <87v9wodx9p.fsf@tethera.net> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Jun 2019 19:09:10 -0000 David Bremner writes: > Alvaro Herrera writes: > >> On 2019-Jun-28, Alvaro Herrera wrote: >> >>> I think a real solution is to parse the message header, look for the >>> Content-Length, and determine mbox-ness by looking for "From" only past >>> that many bytes; that seems to match what other mail parsing tools do. >> >> Sorry, I misspoke: there's no such thing as Content-Length. >> It's Content-Type/boundary that needs to be watched for. Only consider >> that the file is an mbox if a "^From " line appears after the boundary >> end marker (which seems to be defined as "the boundary string followed >> by two dashes --"). >> >> Here's a sample message, BTW: >> https://www.postgresql.org/message-id/raw/3ad5ba71-d200-96da-f903-7e3b16416140@lab.ntt.co.jp >> (username "archives", password "antispam"). > > I'm not keen on writing (more) ad hoc MIME parsing code, so if you can > phrase this in terms of GMime API (or at least MIME parts) it would be > great. > > d On second thought, I guess it might not be practical to use GMime to parse the file, since that might perform badly on large mboxes. d