From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id D50AB6DE11AF for ; Fri, 28 Jun 2019 13:11:29 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -1.11 X-Spam-Level: X-Spam-Status: No, score=-1.11 tagged_above=-999 required=5 tests=[AWL=-0.410, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FAKE_REPLY_C=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FbLtiKMAtK40 for ; Fri, 28 Jun 2019 13:11:28 -0700 (PDT) Received: from out4-smtp.messagingengine.com (out4-smtp.messagingengine.com [66.111.4.28]) by arlo.cworth.org (Postfix) with ESMTPS id 69A3D6DE0FBD for ; Fri, 28 Jun 2019 13:11:28 -0700 (PDT) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id E713C210DC; Fri, 28 Jun 2019 16:11:24 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Fri, 28 Jun 2019 16:11:24 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=8Ne5qH6oRg+QZ5WeyOPjlXPp+45S8KzLQX9mDzjOb0k=; b=f86UOStg O+td11Hm90gCmGvf6rwj2F3dfGJxmBMB3NYp0+D7sTQxA/4RKS1eEFOd+bQgpFKk w7QQ4cvor48uv69aN6KyOOTXaKa7cC18gA7L8CxhDxiXDMbVfKFBeewX7XCXPg8n uZhUdQgmR1WIDnsPtTjSe+SmBYUZlMyywBvbTUTXJ+9kNgoMRYuK0xIsU3WjQQsd 1Fo8lfAjNzptjD5mUW+fYx0GkcFwVL8iNvg7PIBiLcSYgaOJD6u1s0/UEFMf6Oq/ iIL6u3lxvyjmF3n08tCakhNW2OWMGIBeBUhvInmaux7DgQd7chS2axyvFjd+eKte 2OsCjPiteNjbtw== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduvddrvddtgddugeekucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpeffhffvuffkgggtugfgjggfsehtke ertddtredunecuhfhrohhmpeetlhhvrghrohcujfgvrhhrvghrrgcuoegrlhhvhhgvrhhr vgesrghlvhhhrdhnohdqihhprdhorhhgqeenucffohhmrghinhepphhoshhtghhrvghsqh hlrdhorhhgnecukfhppeduledtrdduvddurddvledrfeenucfrrghrrghmpehmrghilhhf rhhomheprghlvhhhvghrrhgvsegrlhhvhhdrnhhoqdhiphdrohhrghenucevlhhushhtvg hrufhiiigvpedt X-ME-Proxy: Received: from nimloth.alvh.no-ip.org (unknown [190.121.29.3]) by mail.messagingengine.com (Postfix) with ESMTPA id 57284380079; Fri, 28 Jun 2019 16:11:23 -0400 (EDT) Received: by nimloth.alvh.no-ip.org (Postfix, from userid 1000) id 7FC6512037F; Fri, 28 Jun 2019 16:11:21 -0400 (-04) Date: Fri, 28 Jun 2019 16:11:21 -0400 From: Alvaro Herrera To: Alexei Gilchrist Cc: notmuch@notmuchmail.org Subject: Re: notmuch ignoring alot of emails Message-ID: <20190628201121.GA8537@alvherre.pgsql> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20190628171626.GA20853@alvherre.pgsql> User-Agent: Mutt/1.9.4 (2018-02-28) X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Jun 2019 20:11:29 -0000 On 2019-Jun-28, Alvaro Herrera wrote: > I think a real solution is to parse the message header, look for the > Content-Length, and determine mbox-ness by looking for "From" only past > that many bytes; that seems to match what other mail parsing tools do. Sorry, I misspoke: there's no such thing as Content-Length. It's Content-Type/boundary that needs to be watched for. Only consider that the file is an mbox if a "^From " line appears after the boundary end marker (which seems to be defined as "the boundary string followed by two dashes --"). Here's a sample message, BTW: https://www.postgresql.org/message-id/raw/3ad5ba71-d200-96da-f903-7e3b16416140@lab.ntt.co.jp (username "archives", password "antispam"). -- Álvaro Herrera Valdivia, Chile