From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id 778D86DE0F6A for ; Wed, 22 May 2019 04:58:08 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -0.038 X-Spam-Level: X-Spam-Status: No, score=-0.038 tagged_above=-999 required=5 tests=[AWL=-0.037, SPF_PASS=-0.001] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Kbl3i8dRD_xu for ; Wed, 22 May 2019 04:58:07 -0700 (PDT) Received: from fethera.tethera.net (fethera.tethera.net [198.245.60.197]) by arlo.cworth.org (Postfix) with ESMTPS id 3BFF56DE0F27 for ; Wed, 22 May 2019 04:58:07 -0700 (PDT) Received: from remotemail by fethera.tethera.net with local (Exim 4.89) (envelope-from ) id 1hTPsk-0005B2-FX for notmuch@notmuchmail.org; Wed, 22 May 2019 07:58:06 -0400 Received: (nullmailer pid 18698 invoked by uid 1000); Wed, 22 May 2019 11:58:08 -0000 From: David Bremner To: notmuch@notmuchmail.org Subject: Re: [PATCH 2/2] n_m_remove_indexed_terms: reduce number of Xapian API calls. In-Reply-To: <20190416014616.31623-3-david@tethera.net> References: <20190416014616.31623-1-david@tethera.net> <20190416014616.31623-3-david@tethera.net> Date: Wed, 22 May 2019 08:58:08 -0300 Message-ID: <874l5mu2en.fsf@tethera.net> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 May 2019 11:58:08 -0000 David Bremner writes: > Previously this functioned scanned every term attached to a given > Xapian document. It turns out we know how to read only the terms we > need to preserve (and we might have already done so). This commit > replaces many calls to Xapian::Document::remove_term with one call to > ::clear_terms, and a (typically much smaller) number of calls to > ::add_term. Roughly speaking this is based on the assumption that most > messages have more text than they have tags. > > According to the performance test suite, this yields a roughly 40% > speedup on "notmuch reindex '*'" I've marked this ready to merge. If you have any feedback, please send it ASAP. d