From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id 77B5B6DE0F19 for ; Sat, 29 Jun 2019 12:03:52 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -0.05 X-Spam-Level: X-Spam-Status: No, score=-0.05 tagged_above=-999 required=5 tests=[AWL=-0.049, SPF_PASS=-0.001] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AXdsxISjLy6k for ; Sat, 29 Jun 2019 12:03:51 -0700 (PDT) Received: from fethera.tethera.net (fethera.tethera.net [198.245.60.197]) by arlo.cworth.org (Postfix) with ESMTPS id E58346DE0F07 for ; Sat, 29 Jun 2019 12:03:50 -0700 (PDT) Received: from remotemail by fethera.tethera.net with local (Exim 4.89) (envelope-from ) id 1hhIdS-0004Bc-Ee; Sat, 29 Jun 2019 15:03:42 -0400 Received: (nullmailer pid 13592 invoked by uid 1000); Sat, 29 Jun 2019 19:03:42 -0000 From: David Bremner To: Alvaro Herrera , Alexei Gilchrist Cc: notmuch@notmuchmail.org Subject: Re: notmuch ignoring alot of emails In-Reply-To: <20190628201121.GA8537@alvherre.pgsql> References: <20190628201121.GA8537@alvherre.pgsql> Date: Sat, 29 Jun 2019 21:03:42 +0200 Message-ID: <87y31kdxip.fsf@tethera.net> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 29 Jun 2019 19:03:52 -0000 Alvaro Herrera writes: > On 2019-Jun-28, Alvaro Herrera wrote: > >> I think a real solution is to parse the message header, look for the >> Content-Length, and determine mbox-ness by looking for "From" only past >> that many bytes; that seems to match what other mail parsing tools do. > > Sorry, I misspoke: there's no such thing as Content-Length. > It's Content-Type/boundary that needs to be watched for. Only consider > that the file is an mbox if a "^From " line appears after the boundary > end marker (which seems to be defined as "the boundary string followed > by two dashes --"). > > Here's a sample message, BTW: > https://www.postgresql.org/message-id/raw/3ad5ba71-d200-96da-f903-7e3b16416140@lab.ntt.co.jp > (username "archives", password "antispam"). I'm not keen on writing (more) ad hoc MIME parsing code, so if you can phrase this in terms of GMime API (or at least MIME parts) it would be great. d