From: David Bremner <david@tethera.net>
To: notmuch@notmuchmail.org
Subject: Re: [PATCH] lib: add 'body:' field, stop indexing headers twice.
Date: Fri, 29 Mar 2019 10:17:35 -0300 [thread overview]
Message-ID: <87k1ghyg40.fsf@tethera.net> (raw)
In-Reply-To: <20190319003921.5517-1-david@tethera.net>
David Bremner <david@tethera.net> writes:
> This follows a suggestion of Olly Betts to use the facility (since
> Xapian 1.0.4) to add the same field with multiple prefixes. The double
> indexing of previous versions is thus replaced with a query time
> expension of unprefixed query terms to the various prefixed
> equivalent.
This patch leads to approximately a 10% decrease in database size on our performance
suite (2.1G -> 1.9G) before compaction. After compaction, old / new is
1.4G -> 1.3G
With the caveat that the benchmark machine was not completely idle, it
also leads to a roughly 10% speedup.
Existing indexing:
T00-new.sh: Testing notmuch new [0.4 large]
Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B)
Initial notmuch new 565.17 534.82 28.22 474632 0/13854576
notmuch new #2 0.03 0.00 0.00 9512 0/160
notmuch new #3 0.00 0.00 0.00 9368 0/8
notmuch new #4 0.00 0.00 0.00 9412 0/8
notmuch new #5 0.00 0.00 0.00 9384 0/8
notmuch new #6 0.00 0.00 0.00 9388 0/8
T01-dump-restore.sh: Testing dump and restore [0.4 large]
Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B)
load nmbug tags 16.25 2.65 3.05 12668 104/40104
dump * 3.90 3.79 0.10 26048 0/27928
restore * 4.51 4.10 0.41 9564 0/0
T02-tag.sh: Testing tagging [0.4 large]
Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B)
tag * +new_tag 374.69 197.56 169.55 118644 0/1818656
tag * +existing_tag 0.00 0.00 0.00 9232 0/0
tag * -existing_tag 318.47 151.46 164.56 36260 0/1819584
tag * -missing_tag 0.00 0.00 0.00 9336 0/0
T03-reindex.sh: Testing tagging [0.4 large]
Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B)
reindex * 688.27 488.02 197.59 11142680 0/4908120
reindex * 648.04 456.06 191.78 11139124 0/2696120
reindex * 650.70 459.08 191.48 11139088 0/2696680
T04-thread-subquery.sh: Testing thread subqueries [0.4 large]
Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B)
search thread:{} ... 2.45 2.29 0.15 94696 0/144
search thread:{} ... 2.43 2.23 0.20 94228 0/144
search thread:{} ... 2.46 2.26 0.20 94224 0/144
With new indexing:
T00-new.sh: Testing notmuch new [0.4 large]
Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B)
Initial notmuch new 494.31 466.96 24.28 447428 0/12093344
notmuch new #2 0.03 0.00 0.00 9356 0/144
notmuch new #3 0.01 0.01 0.00 9420 0/8
notmuch new #4 0.00 0.00 0.00 9388 0/8
notmuch new #5 0.00 0.00 0.00 9416 0/8
notmuch new #6 0.01 0.00 0.01 9424 0/8
T01-dump-restore.sh: Testing dump and restore [0.4 large]
Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B)
load nmbug tags 14.21 2.41 2.71 12664 0/38952
dump * 3.70 3.57 0.12 26092 0/27928
restore * 4.19 3.78 0.41 9412 0/0
T02-tag.sh: Testing tagging [0.4 large]
Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B)
tag * +new_tag 353.31 183.89 161.49 111244 0/1693872
tag * +existing_tag 0.00 0.00 0.00 9316 0/0
tag * -existing_tag 284.07 137.15 144.33 36712 0/1659200
tag * -missing_tag 0.00 0.00 0.00 9240 0/0
T03-reindex.sh: Testing tagging [0.4 large]
Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B)
reindex * 640.19 431.23 196.99 10214564 1510/4504024
reindex * 611.46 412.37 193.07 10211852 1056/2557688
reindex * 612.95 415.40 194.97 10211848 0/2555032
T04-thread-subquery.sh: Testing thread subqueries [0.4 large]
Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B)
search thread:{} ... 2.34 2.12 0.21 96452 0/144
search thread:{} ... 2.35 2.17 0.18 96208 0/144
search thread:{} ... 2.33 2.08 0.25 94740 0/144
next prev parent reply other threads:[~2019-03-29 13:17 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-18 11:56 [PATCH] WIP: add searching by body: David Bremner
2019-02-18 13:06 ` David Bremner
2019-03-04 2:29 ` [PATCH] lib: add 'body:' field, stop indexing headers twice David Bremner
2019-03-05 1:26 ` Matt Armstrong
2019-03-13 0:47 ` v2. add body: / drop double indexing of headers David Bremner
2019-03-13 0:47 ` [PATCH 1/4] lib: drop comment about only indexing one file David Bremner
2019-03-13 0:47 ` [PATCH 2/4] lib: add clarification about the use of "prefix" in the docs David Bremner
2019-03-13 0:47 ` [PATCH 3/4] lib: update commentary about path/folder terms David Bremner
2019-03-31 17:53 ` David Bremner
2019-03-13 0:47 ` [PATCH 4/4] lib: add 'body:' field, stop indexing headers twice David Bremner
2019-03-13 5:30 ` David Bremner
2019-03-13 11:44 ` [PATCH] " David Bremner
2019-03-19 0:39 ` David Bremner
2019-03-29 13:17 ` David Bremner [this message]
2019-04-14 11:32 ` David Bremner
2019-04-17 11:55 ` David Bremner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://notmuchmail.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87k1ghyg40.fsf@tethera.net \
--to=david@tethera.net \
--cc=notmuch@notmuchmail.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://yhetil.org/notmuch.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).