From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id D71536DE1059 for ; Fri, 29 Mar 2019 06:17:40 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -0.02 X-Spam-Level: X-Spam-Status: No, score=-0.02 tagged_above=-999 required=5 tests=[AWL=-0.019, SPF_PASS=-0.001] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AxByAFpXTgbb for ; Fri, 29 Mar 2019 06:17:40 -0700 (PDT) Received: from fethera.tethera.net (fethera.tethera.net [198.245.60.197]) by arlo.cworth.org (Postfix) with ESMTPS id E3A916DE0F0D for ; Fri, 29 Mar 2019 06:17:39 -0700 (PDT) Received: from remotemail by fethera.tethera.net with local (Exim 4.89) (envelope-from ) id 1h9rO5-00021S-7U for notmuch@notmuchmail.org; Fri, 29 Mar 2019 09:17:37 -0400 Received: (nullmailer pid 28598 invoked by uid 1000); Fri, 29 Mar 2019 13:17:35 -0000 From: David Bremner To: notmuch@notmuchmail.org Subject: Re: [PATCH] lib: add 'body:' field, stop indexing headers twice. In-Reply-To: <20190319003921.5517-1-david@tethera.net> References: <20190313114450.12163-1-david@tethera.net> <20190319003921.5517-1-david@tethera.net> Date: Fri, 29 Mar 2019 10:17:35 -0300 Message-ID: <87k1ghyg40.fsf@tethera.net> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Mar 2019 13:17:40 -0000 David Bremner writes: > This follows a suggestion of Olly Betts to use the facility (since > Xapian 1.0.4) to add the same field with multiple prefixes. The double > indexing of previous versions is thus replaced with a query time > expension of unprefixed query terms to the various prefixed > equivalent. This patch leads to approximately a 10% decrease in database size on our performance suite (2.1G -> 1.9G) before compaction. After compaction, old / new is 1.4G -> 1.3G With the caveat that the benchmark machine was not completely idle, it also leads to a roughly 10% speedup. Existing indexing: T00-new.sh: Testing notmuch new [0.4 large] Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B) Initial notmuch new 565.17 534.82 28.22 474632 0/13854576 notmuch new #2 0.03 0.00 0.00 9512 0/160 notmuch new #3 0.00 0.00 0.00 9368 0/8 notmuch new #4 0.00 0.00 0.00 9412 0/8 notmuch new #5 0.00 0.00 0.00 9384 0/8 notmuch new #6 0.00 0.00 0.00 9388 0/8 T01-dump-restore.sh: Testing dump and restore [0.4 large] Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B) load nmbug tags 16.25 2.65 3.05 12668 104/40104 dump * 3.90 3.79 0.10 26048 0/27928 restore * 4.51 4.10 0.41 9564 0/0 T02-tag.sh: Testing tagging [0.4 large] Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B) tag * +new_tag 374.69 197.56 169.55 118644 0/1818656 tag * +existing_tag 0.00 0.00 0.00 9232 0/0 tag * -existing_tag 318.47 151.46 164.56 36260 0/1819584 tag * -missing_tag 0.00 0.00 0.00 9336 0/0 T03-reindex.sh: Testing tagging [0.4 large] Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B) reindex * 688.27 488.02 197.59 11142680 0/4908120 reindex * 648.04 456.06 191.78 11139124 0/2696120 reindex * 650.70 459.08 191.48 11139088 0/2696680 T04-thread-subquery.sh: Testing thread subqueries [0.4 large] Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B) search thread:{} ... 2.45 2.29 0.15 94696 0/144 search thread:{} ... 2.43 2.23 0.20 94228 0/144 search thread:{} ... 2.46 2.26 0.20 94224 0/144 With new indexing: T00-new.sh: Testing notmuch new [0.4 large] Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B) Initial notmuch new 494.31 466.96 24.28 447428 0/12093344 notmuch new #2 0.03 0.00 0.00 9356 0/144 notmuch new #3 0.01 0.01 0.00 9420 0/8 notmuch new #4 0.00 0.00 0.00 9388 0/8 notmuch new #5 0.00 0.00 0.00 9416 0/8 notmuch new #6 0.01 0.00 0.01 9424 0/8 T01-dump-restore.sh: Testing dump and restore [0.4 large] Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B) load nmbug tags 14.21 2.41 2.71 12664 0/38952 dump * 3.70 3.57 0.12 26092 0/27928 restore * 4.19 3.78 0.41 9412 0/0 T02-tag.sh: Testing tagging [0.4 large] Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B) tag * +new_tag 353.31 183.89 161.49 111244 0/1693872 tag * +existing_tag 0.00 0.00 0.00 9316 0/0 tag * -existing_tag 284.07 137.15 144.33 36712 0/1659200 tag * -missing_tag 0.00 0.00 0.00 9240 0/0 T03-reindex.sh: Testing tagging [0.4 large] Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B) reindex * 640.19 431.23 196.99 10214564 1510/4504024 reindex * 611.46 412.37 193.07 10211852 1056/2557688 reindex * 612.95 415.40 194.97 10211848 0/2555032 T04-thread-subquery.sh: Testing thread subqueries [0.4 large] Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B) search thread:{} ... 2.34 2.12 0.21 96452 0/144 search thread:{} ... 2.35 2.17 0.18 96208 0/144 search thread:{} ... 2.33 2.08 0.25 94740 0/144