From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id 2C4ED6DE2407 for ; Sun, 16 Jul 2017 05:44:52 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -0.001 X-Spam-Level: X-Spam-Status: No, score=-0.001 tagged_above=-999 required=5 tests=[AWL=0.010, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PZ8JLb_ijiWB for ; Sun, 16 Jul 2017 05:44:51 -0700 (PDT) Received: from fethera.tethera.net (fethera.tethera.net [198.245.60.197]) by arlo.cworth.org (Postfix) with ESMTPS id 5FEDC6DE23C0 for ; Sun, 16 Jul 2017 05:44:51 -0700 (PDT) Received: from remotemail by fethera.tethera.net with local (Exim 4.84_2) (envelope-from ) id 1dWirc-0007uV-17 for notmuch@notmuchmail.org; Sun, 16 Jul 2017 08:41:32 -0400 Received: (nullmailer pid 7588 invoked by uid 1000); Sun, 16 Jul 2017 12:44:49 -0000 From: David Bremner To: notmuch@notmuchmail.org Subject: Re: [PATCH 1/7] test: add known broken test for indexing html In-Reply-To: <20170322112306.12060-2-david@tethera.net> References: <20170322112306.12060-1-david@tethera.net> <20170322112306.12060-2-david@tethera.net> Date: Sun, 16 Jul 2017 09:44:49 -0300 Message-ID: <871spgczzy.fsf@tethera.net> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Jul 2017 12:44:52 -0000 David Bremner writes: > 'quite' on IRC reported that notmuch new was grinding to a halt during > initial indexing, and we eventually narrowed the problem down to some > html parts with large embedded images. These cause the number of terms > added to the Xapian database to explode (the first 400 messages > generated 4.6M unique terms), and of course the resulting terms are > not much use for searching. this bug is fixed in master / release d