From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id CA6BF6DE0F64 for ; Mon, 1 Jul 2019 09:35:39 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -1.071 X-Spam-Level: X-Spam-Status: No, score=-1.071 tagged_above=-999 required=5 tests=[AWL=-0.371, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FAKE_REPLY_C=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2oKfZ2VzRdpG for ; Mon, 1 Jul 2019 09:35:37 -0700 (PDT) Received: from out2-smtp.messagingengine.com (out2-smtp.messagingengine.com [66.111.4.26]) by arlo.cworth.org (Postfix) with ESMTPS id 992CD6DE0F31 for ; Mon, 1 Jul 2019 09:35:37 -0700 (PDT) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id BF945220CA; Mon, 1 Jul 2019 12:35:34 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute4.internal (MEProxy); Mon, 01 Jul 2019 12:35:34 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=gYqKrt45PipXJgIYwOmV6BfqANGFrQ8K/oQaP+rhGUc=; b=v+n+CGiE APhL9b+jTruFCjMiKrP10A3nCRMrxJQoftuAkchpTzX//jod6KXrrqFjUMuFsiBy 0M02FKZEL+XST4zeJyZlpu8H3PNN4RoghDR6IwCcTxzew/JLDa6c9X9cgIHFFSol 276mh1zTBDbUFT3NY7YX07e9xBK9gaIwgciGlT1CtMZmgTMeGSQWDzVGQNfQjNe0 RNLMH+hP8jG/9bptU4osztzOkbrJXSS6EE8YnALts8CVbxMaeZLidQb1gQTqV//9 PS0brHZqnySBi2nHh2kZs2nb2iqAOecuOpTAq+Yi0RTWbyNnztHG5TYmpn+AM26T sLL0aPC+qJXkoQ== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduvddrvdeigddutddvucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvffukfggtggugfgjfgesthekredttderudenucfhrhhomheptehlvhgr rhhoucfjvghrrhgvrhgruceorghlvhhhvghrrhgvsegrlhhvhhdrnhhoqdhiphdrohhrgh eqnecuffhomhgrihhnpehtfihithhtvghrrdgtohhmnecukfhppeduledtrdduvddurddv ledrfeenucfrrghrrghmpehmrghilhhfrhhomheprghlvhhhvghrrhgvsegrlhhvhhdrnh hoqdhiphdrohhrghenucevlhhushhtvghrufhiiigvpedt X-ME-Proxy: Received: from nimloth.alvh.no-ip.org (unknown [190.121.29.3]) by mail.messagingengine.com (Postfix) with ESMTPA id D9B3F80069; Mon, 1 Jul 2019 12:35:33 -0400 (EDT) Received: by nimloth.alvh.no-ip.org (Postfix, from userid 1000) id 600D11208FD; Mon, 1 Jul 2019 11:26:21 -0400 (-04) Date: Mon, 1 Jul 2019 11:26:21 -0400 From: Alvaro Herrera To: David Bremner Cc: Alexei Gilchrist , notmuch@notmuchmail.org Subject: Re: notmuch ignoring alot of emails Message-ID: <20190701152621.GA9546@alvherre.pgsql> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <87v9wodx9p.fsf@tethera.net> User-Agent: Mutt/1.9.4 (2018-02-28) X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Jul 2019 16:35:39 -0000 On 2019-Jun-29, David Bremner wrote: > David Bremner writes: > > > Alvaro Herrera writes: > >> It's Content-Type/boundary that needs to be watched for. Only consider > >> that the file is an mbox if a "^From " line appears after the boundary > >> end marker (which seems to be defined as "the boundary string followed > >> by two dashes --"). > > I'm not keen on writing (more) ad hoc MIME parsing code, so if you can > > phrase this in terms of GMime API (or at least MIME parts) it would be > > great. Yeah, I was having a look at the GMime API last week to have a think about how to do it with that. > On second thought, I guess it might not be practical to use GMime to parse > the file, since that might perform badly on large mboxes. I think we only need to search for the first end boundary; if there's anything beyond that, return is_mbox true. So we only need to fully process the first email, and we can stop searching at that point. -- Álvaro Herrera http://www.twitter.com/alvherre "Puedes vivir sólo una vez, pero si lo haces bien, una vez es suficiente"