unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* slowdown in notmuch perf suite with xapian 1.3.5
@ 2016-04-07 11:56 David Bremner
  2016-04-07 23:25 ` Olly Betts
  0 siblings, 1 reply; 5+ messages in thread
From: David Bremner @ 2016-04-07 11:56 UTC (permalink / raw)
  To: notmuch, xapian-discuss


I hadn't noticed any interactive slowdown, but when I got around to
running the notmuch performance suite, there seems to be some noticable
slowdown with the glass backend (default in Xapian 1.3.5) compared to
chert (using xapian 1.2.22)

These tests are on an older i7 with 12G of RAM and an SSD. I'm
reasonable confident they are CPU bound. One curious thing is the
increase in system time in the glass case. It also looks like the glass
backend is doing a lot more I/O, which could be related.

The current notmuch performance corpus has about 200k documents,
totalling about 3.5G. Unfortunately each number here represents only a
single run. I did rerun the tests with the glass backend, and the
variation was reasonably small.

Chert
=====

T00-new.sh: Testing notmuch new                         [0.4 large]
			Wall(s)	Usr(s)	Sys(s)	Res(K)	In/Out(512B)
  Initial notmuch new   669.06	639.78	21.42	323684	3576/9360440
  notmuch new #2        0.46	0.00	0.00	8240	3568/200
  notmuch new #3        0.01	0.00	0.00	7916	0/8
  notmuch new #4        0.01	0.01	0.00	8008	0/8
  notmuch new #5        0.01	0.00	0.00	8040	0/8
  notmuch new #6        0.01	0.00	0.00	8040	0/8

T01-dump-restore.sh: Testing dump and restore           [0.4 large]
			Wall(s)	Usr(s)	Sys(s)	Res(K)	In/Out(512B)
  load nmbug tags       5.85	2.64	0.10	11280	1376/40496
  dump *                7.45	6.51	0.94	25272	104/27928
  restore *             7.55	7.15	0.39	8180	0/0

T02-tag.sh: Testing tagging                             [0.4 large]
			Wall(s)	Usr(s)	Sys(s)	Res(K)	In/Out(512B)
  tag * +new_tag        200.13	183.01	7.08	38628	264/1664552
  tag * +existing_tag   0.00	0.00	0.00	8356	0/0
  tag * -existing_tag   153.47	145.00	4.02	34928	0/1626320
  tag * -missing_tag    0.00	0.00	0.00	8252	0/0

Glass
=====

T00-new.sh: Testing notmuch new                         [0.4 large]
			Wall(s)	Usr(s)	Sys(s)	Res(K)	In/Out(512B)
  Initial notmuch new   949.53	697.05	206.98	277436	1290744/21767856
  notmuch new #2        2.12	0.01	0.02	8204	2552/160
  notmuch new #3        0.01	0.00	0.00	8216	0/8
  notmuch new #4        0.01	0.00	0.00	8192	0/8
  notmuch new #5        0.01	0.00	0.00	8216	0/8
  notmuch new #6        0.01	0.00	0.00	8144	0/8

T01-dump-restore.sh: Testing dump and restore           [0.4 large]
			Wall(s)	Usr(s)	Sys(s)	Res(K)	In/Out(512B)
  load nmbug tags       10.78	4.06	3.59	11376	600/39832
  dump *                7.44	6.52	0.91	25296	0/27928
  restore *             7.74	7.24	0.48	8740	0/0

T02-tag.sh: Testing tagging                             [0.4 large]
			Wall(s)	Usr(s)	Sys(s)	Res(K)	In/Out(512B)
  tag * +new_tag        481.78	278.80	196.89	39448	0/1897360
  tag * +existing_tag   0.00	0.00	0.00	8496	0/0
  tag * -existing_tag   449.58	242.65	202.74	35456	0/2073520
  tag * -missing_tag    0.00	0.00	0.00	8440	0/0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: slowdown in notmuch perf suite with xapian 1.3.5
  2016-04-07 11:56 slowdown in notmuch perf suite with xapian 1.3.5 David Bremner
@ 2016-04-07 23:25 ` Olly Betts
  2016-04-08  0:40   ` David Bremner
  0 siblings, 1 reply; 5+ messages in thread
From: Olly Betts @ 2016-04-07 23:25 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch, xapian-discuss

On Thu, Apr 07, 2016 at 08:56:46AM -0300, David Bremner wrote:
> I hadn't noticed any interactive slowdown, but when I got around to
> running the notmuch performance suite, there seems to be some noticable
> slowdown with the glass backend (default in Xapian 1.3.5) compared to
> chert (using xapian 1.2.22)

Some of this is pretty much expected, though other parts I don't
entirely understand.

One of the big changes in glass is how the position table is structured.
In chert, it is ordered by (document,term) but in glass that has been
changed to (term,document).

This change makes a huge difference to phrase searches in cases where
a lot of phrase data is needed, but it has an indexing time cost -
adding a new document can no longer just append a load of entries to
the position table, but instead we need to buffer up the changes, and
then merge the entries within the existing table.

The trade-off isn't ideal for everyone, but the cases of slow phrase
searches were a real pain point that needed addressing.  The plan is
to optimise indexing speed in other ways to regain this loss - some
of that has been done but there's a lot more to do still.

So the T00-new.sh numbers make sense - there's more work to do, and
we need to read existing positional data more to insert the new stuff,
so the increased reads and writes make sense.

But guessing at what the other two tests do, I wouldn't expect them to
be affected by this.

I'm also a bit puzzled by how glass can manage not to read any data
for "dump *", and several tests seem to not read or write anything
for either backend.  What exactly are the "In/Out" numbers?

Cheers,
    Olly

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: slowdown in notmuch perf suite with xapian 1.3.5
  2016-04-07 23:25 ` Olly Betts
@ 2016-04-08  0:40   ` David Bremner
  2016-04-08  0:57     ` Olly Betts
  0 siblings, 1 reply; 5+ messages in thread
From: David Bremner @ 2016-04-08  0:40 UTC (permalink / raw)
  To: notmuch, xapian-discuss

Olly Betts <olly@survex.com> writes:

>
> So the T00-new.sh numbers make sense - there's more work to do, and
> we need to read existing positional data more to insert the new stuff,
> so the increased reads and writes make sense.
>
> But guessing at what the other two tests do, I wouldn't expect them to
> be affected by this.

The non-optimized-away cases of T02-tag just adding and deleting terms
to each document with term Tmail

> I'm also a bit puzzled by how glass can manage not to read any data
> for "dump *", and several tests seem to not read or write anything
> for either backend.  What exactly are the "In/Out" numbers?

that's just the output from /usr/bin/time -f '%e\t%U\t%S\t%M\t%I/%O'

The manual describes them as "number of file system
inputs/outputs". From looking at the source, they correspond to
ru_inblock and ru_oublock fields from the getrusage call. AFAIU, that
means the number of non-cached read/writes.

d

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: slowdown in notmuch perf suite with xapian 1.3.5
  2016-04-08  0:40   ` David Bremner
@ 2016-04-08  0:57     ` Olly Betts
  2016-04-08  1:14       ` David Bremner
  0 siblings, 1 reply; 5+ messages in thread
From: Olly Betts @ 2016-04-08  0:57 UTC (permalink / raw)
  To: David Bremner; +Cc: notmuch, xapian-discuss

On Thu, Apr 07, 2016 at 09:40:59PM -0300, David Bremner wrote:
> Olly Betts <olly@survex.com> writes:
> 
> >
> > So the T00-new.sh numbers make sense - there's more work to do, and
> > we need to read existing positional data more to insert the new stuff,
> > so the increased reads and writes make sense.
> >
> > But guessing at what the other two tests do, I wouldn't expect them to
> > be affected by this.
> 
> The non-optimized-away cases of T02-tag just adding and deleting terms
> to each document with term Tmail

That should short-cut to just only changing the data for Tmail.  Perhaps that's
not working correctly - I'll take a look at this, but probably after 1.4.0 is
out.

> > I'm also a bit puzzled by how glass can manage not to read any data
> > for "dump *", and several tests seem to not read or write anything
> > for either backend.  What exactly are the "In/Out" numbers?
> 
> that's just the output from /usr/bin/time -f '%e\t%U\t%S\t%M\t%I/%O'
> 
> The manual describes them as "number of file system
> inputs/outputs". From looking at the source, they correspond to
> ru_inblock and ru_oublock fields from the getrusage call. AFAIU, that
> means the number of non-cached read/writes.

Non-cached reads/writes are arguably the most useful sort to measure, but the
reads at least will be sensitive to OS caching, which means a repeat run will
generally show lower numbers of reads, e.g.:

$ /usr/bin/time -f '%I/%O' wc randomfile 
  240  2908 96780 randomfile
192/0
$ /usr/bin/time -f '%I/%O' wc randomfile 
  240  2908 96780 randomfile
0/0

So those numbers may not be entirely comparable, depending what order your
tests were done in, and whether you'd run the tests (or cloned the repo or some
other operation which read or wrote the files used) recently enough that their
data might still be cached.

Cheers,
    Olly

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: slowdown in notmuch perf suite with xapian 1.3.5
  2016-04-08  0:57     ` Olly Betts
@ 2016-04-08  1:14       ` David Bremner
  0 siblings, 0 replies; 5+ messages in thread
From: David Bremner @ 2016-04-08  1:14 UTC (permalink / raw)
  To: Olly Betts; +Cc: notmuch, xapian-discuss

Olly Betts <olly@survex.com> writes:

> Non-cached reads/writes are arguably the most useful sort to measure, but the
> reads at least will be sensitive to OS caching, which means a repeat run will
> generally show lower numbers of reads, e.g.:
>
> $ /usr/bin/time -f '%I/%O' wc randomfile 
>   240  2908 96780 randomfile
> 192/0
> $ /usr/bin/time -f '%I/%O' wc randomfile 
>   240  2908 96780 randomfile
> 0/0
>
> So those numbers may not be entirely comparable, depending what order your
> tests were done in, and whether you'd run the tests (or cloned the repo or some
> other operation which read or wrote the files used) recently enough that their
> data might still be cached.

Here are the number from second glass run. The order was glass / chert /
glass


T00-new.sh: Testing notmuch new                         [0.4 large]
			Wall(s)	Usr(s)	Sys(s)	Res(K)	In/Out(512B)
  Initial notmuch new   920.53	698.96	207.02	245188	3528/22442096
  notmuch new #2        0.55	0.00	0.01	8048	6960/160
  notmuch new #3        0.01	0.00	0.00	8112	0/8
  notmuch new #4        0.01	0.01	0.00	8136	0/8
  notmuch new #5        0.01	0.00	0.00	8140	0/8
  notmuch new #6        0.01	0.00	0.00	8116	0/8

T01-dump-restore.sh: Testing dump and restore           [0.4 large]
			Wall(s)	Usr(s)	Sys(s)	Res(K)	In/Out(512B)
  load nmbug tags       8.89	4.23	3.88	11648	368/40072
  dump *                7.37	6.29	1.08	25268	72/27928
  restore *             7.60	7.16	0.43	8624	0/0

T02-tag.sh: Testing tagging                             [0.4 large]
			Wall(s)	Usr(s)	Sys(s)	Res(K)	In/Out(512B)
  tag * +new_tag        474.16	274.89	191.52	34820	16/1920240
  tag * +existing_tag   0.01	0.01	0.00	8480	152/0
  tag * -existing_tag   438.62	239.02	195.44	34928	0/1970160
  tag * -missing_tag    0.00	0.00	0.00	8264	0/0

It's a bit faster overall, but not radically so. So I think cache
effects are not the main issue here.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-04-08  1:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-07 11:56 slowdown in notmuch perf suite with xapian 1.3.5 David Bremner
2016-04-07 23:25 ` Olly Betts
2016-04-08  0:40   ` David Bremner
2016-04-08  0:57     ` Olly Betts
2016-04-08  1:14       ` David Bremner

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).