unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: David Bremner <david@tethera.net>
To: notmuch@notmuchmail.org
Subject: Re: [PATCH] lib: add 'body:' field, stop indexing headers twice.
Date: Fri, 29 Mar 2019 10:17:35 -0300	[thread overview]
Message-ID: <87k1ghyg40.fsf@tethera.net> (raw)
In-Reply-To: <20190319003921.5517-1-david@tethera.net>

David Bremner <david@tethera.net> writes:

> This follows a suggestion of Olly Betts to use the facility (since
> Xapian 1.0.4) to add the same field with multiple prefixes. The double
> indexing of previous versions is thus replaced with a query time
> expension of unprefixed query terms to the various prefixed
> equivalent.

This patch leads to approximately a 10% decrease in database size on our performance
suite (2.1G -> 1.9G) before compaction.  After compaction, old / new is
1.4G -> 1.3G

With the caveat that the benchmark machine was not completely idle, it
also leads to a roughly 10% speedup.

Existing indexing:

T00-new.sh: Testing notmuch new                         [0.4 large]
			Wall(s)	Usr(s)	Sys(s)	Res(K)	In/Out(512B)
  Initial notmuch new   565.17	534.82	28.22	474632	0/13854576
  notmuch new #2        0.03	0.00	0.00	9512	0/160
  notmuch new #3        0.00	0.00	0.00	9368	0/8
  notmuch new #4        0.00	0.00	0.00	9412	0/8
  notmuch new #5        0.00	0.00	0.00	9384	0/8
  notmuch new #6        0.00	0.00	0.00	9388	0/8

T01-dump-restore.sh: Testing dump and restore           [0.4 large]
			Wall(s)	Usr(s)	Sys(s)	Res(K)	In/Out(512B)
  load nmbug tags       16.25	2.65	3.05	12668	104/40104
  dump *                3.90	3.79	0.10	26048	0/27928
  restore *             4.51	4.10	0.41	9564	0/0

T02-tag.sh: Testing tagging                             [0.4 large]
			Wall(s)	Usr(s)	Sys(s)	Res(K)	In/Out(512B)
  tag * +new_tag        374.69	197.56	169.55	118644	0/1818656
  tag * +existing_tag   0.00	0.00	0.00	9232	0/0
  tag * -existing_tag   318.47	151.46	164.56	36260	0/1819584
  tag * -missing_tag    0.00	0.00	0.00	9336	0/0

T03-reindex.sh: Testing tagging                         [0.4 large]
			Wall(s)	Usr(s)	Sys(s)	Res(K)	In/Out(512B)
  reindex *             688.27	488.02	197.59	11142680	0/4908120
  reindex *             648.04	456.06	191.78	11139124	0/2696120
  reindex *             650.70	459.08	191.48	11139088	0/2696680

T04-thread-subquery.sh: Testing thread subqueries       [0.4 large]
			Wall(s)	Usr(s)	Sys(s)	Res(K)	In/Out(512B)
  search thread:{} ...  2.45	2.29	0.15	94696	0/144
  search thread:{} ...  2.43	2.23	0.20	94228	0/144
  search thread:{} ...  2.46	2.26	0.20	94224	0/144

With new indexing:

T00-new.sh: Testing notmuch new                         [0.4 large]
			Wall(s)	Usr(s)	Sys(s)	Res(K)	In/Out(512B)
  Initial notmuch new   494.31	466.96	24.28	447428	0/12093344
  notmuch new #2        0.03	0.00	0.00	9356	0/144
  notmuch new #3        0.01	0.01	0.00	9420	0/8
  notmuch new #4        0.00	0.00	0.00	9388	0/8
  notmuch new #5        0.00	0.00	0.00	9416	0/8
  notmuch new #6        0.01	0.00	0.01	9424	0/8

T01-dump-restore.sh: Testing dump and restore           [0.4 large]
			Wall(s)	Usr(s)	Sys(s)	Res(K)	In/Out(512B)
  load nmbug tags       14.21	2.41	2.71	12664	0/38952
  dump *                3.70	3.57	0.12	26092	0/27928
  restore *             4.19	3.78	0.41	9412	0/0

T02-tag.sh: Testing tagging                             [0.4 large]
			Wall(s)	Usr(s)	Sys(s)	Res(K)	In/Out(512B)
  tag * +new_tag        353.31	183.89	161.49	111244	0/1693872
  tag * +existing_tag   0.00	0.00	0.00	9316	0/0
  tag * -existing_tag   284.07	137.15	144.33	36712	0/1659200
  tag * -missing_tag    0.00	0.00	0.00	9240	0/0

T03-reindex.sh: Testing tagging                         [0.4 large]
			Wall(s)	Usr(s)	Sys(s)	Res(K)	In/Out(512B)
  reindex *             640.19	431.23	196.99	10214564	1510/4504024
  reindex *             611.46	412.37	193.07	10211852	1056/2557688
  reindex *             612.95	415.40	194.97	10211848	0/2555032

T04-thread-subquery.sh: Testing thread subqueries       [0.4 large]
			Wall(s)	Usr(s)	Sys(s)	Res(K)	In/Out(512B)
  search thread:{} ...  2.34	2.12	0.21	96452	0/144
  search thread:{} ...  2.35	2.17	0.18	96208	0/144
  search thread:{} ...  2.33	2.08	0.25	94740	0/144

  reply	other threads:[~2019-03-29 13:17 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-18 11:56 [PATCH] WIP: add searching by body: David Bremner
2019-02-18 13:06 ` David Bremner
2019-03-04  2:29 ` [PATCH] lib: add 'body:' field, stop indexing headers twice David Bremner
2019-03-05  1:26   ` Matt Armstrong
2019-03-13  0:47     ` v2. add body: / drop double indexing of headers David Bremner
2019-03-13  0:47       ` [PATCH 1/4] lib: drop comment about only indexing one file David Bremner
2019-03-13  0:47       ` [PATCH 2/4] lib: add clarification about the use of "prefix" in the docs David Bremner
2019-03-13  0:47       ` [PATCH 3/4] lib: update commentary about path/folder terms David Bremner
2019-03-31 17:53         ` David Bremner
2019-03-13  0:47       ` [PATCH 4/4] lib: add 'body:' field, stop indexing headers twice David Bremner
2019-03-13  5:30         ` David Bremner
2019-03-13 11:44           ` [PATCH] " David Bremner
2019-03-19  0:39             ` David Bremner
2019-03-29 13:17               ` David Bremner [this message]
2019-04-14 11:32               ` David Bremner
2019-04-17 11:55               ` David Bremner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k1ghyg40.fsf@tethera.net \
    --to=david@tethera.net \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).