From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id 98A3B6DE1498 for ; Sun, 30 Jun 2019 10:29:35 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: 1.016 X-Spam-Level: * X-Spam-Status: No, score=1.016 tagged_above=-999 required=5 tests=[AWL=-0.843, MISSING_HEADERS=1.207, SPF_NEUTRAL=0.652] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id A6KXXhxQABBm for ; Sun, 30 Jun 2019 10:29:34 -0700 (PDT) Received: from guru.guru-group.fi (guru.guru-group.fi [46.183.73.34]) by arlo.cworth.org (Postfix) with ESMTP id 079976DE1492 for ; Sun, 30 Jun 2019 10:29:32 -0700 (PDT) Received: from guru.guru-group.fi (localhost [IPv6:::1]) by guru.guru-group.fi (Postfix) with ESMTP id 4F2941000D0 for ; Sun, 30 Jun 2019 20:29:25 +0300 (EEST) From: Tomi Ollila Cc: notmuch@notmuchmail.org Subject: Re: notmuch ignoring alot of emails In-Reply-To: <20190628201121.GA8537@alvherre.pgsql> References: <20190628201121.GA8537@alvherre.pgsql> User-Agent: Notmuch/0.28.3+84~g41389bb (https://notmuchmail.org) Emacs/25.2.1 (x86_64-unknown-linux-gnu) X-Face: HhBM'cA~ MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 30 Jun 2019 17:29:35 -0000 On Fri, Jun 28 2019, Alvaro Herrera wrote: > On 2019-Jun-28, Alvaro Herrera wrote: > >> I think a real solution is to parse the message header, look for the >> Content-Length, and determine mbox-ness by looking for "From" only past >> that many bytes; that seems to match what other mail parsing tools do. > > Sorry, I misspoke: there's no such thing as Content-Length. > It's Content-Type/boundary that needs to be watched for. Only consider > that the file is an mbox if a "^From " line appears after the boundary > end marker (which seems to be defined as "the boundary string followed > by two dashes --"). Just checking line starting with 'From ' would be pretty na=C3=AFve since >From may be first word in any line in text body. If we'd have to do content scanning then at least empty line before >From would be reguired, and next lines starting like Received: someone@not.an.example Date: a date From: someone (and then empty line... ;) all this checkin would be required and still it could fail (perhaps this content get modified in the fly, but then signature check, if this mail had one, could fail...) If there is header that tells the length of the body, then things could be easier... Tomi > > --=20 > =C3=81lvaro Herrera Valdivia, Chile