unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* please eat my data!
@ 2010-04-12 13:33 Sebastian Spaeth
  2010-04-12 14:21 ` Jameson Rollins
  0 siblings, 1 reply; 8+ messages in thread
From: Sebastian Spaeth @ 2010-04-12 13:33 UTC (permalink / raw
  To: Notmuch list

fsync is really killing xapian (and notmuch). What suffers, are the
boolean prefixes (tag, id, and thread). Using libeatmydata (which
disables fsync) shows a 10x speedup for tagging. The speedup is only
factor 2 for e.g. from: searches. This is ext4 on recent stock
Ubuntu. Given that search by tag and thread are performed really often
(each time I advance a thread, for example), that really hurts.

With a warm file cache and a thread containing 11 messages:

---------------------------------------------------
time notmuch tag +test -- thread:0000000000000f4e
real	0m0.677s
user	0m0.030s
sys	0m0.020s
---------------------------------------------------
time LD_PRELOAD=./libeatmydata.so notmuch tag +test -- thread:0000000000000f4e

real	0m0.040s
user	0m0.020s
sys	0m0.020s
---------------------------------------------------

However tagging ~850 messages based on a from search is "ONLY" factor 2:
------------------------------------------------------
time notmuch tag +test -- from:"sebastian@sspaeth.de"

real	0m2.355s
user	0m1.240s
sys	0m0.040s
---------------------------------------------------
time LD_PRELOAD=./libeatmydata.so notmuch tag +test -- from:"sebastian@sspaeth.de"

real	0m1.286s
user	0m1.230s
sys	0m0.010s
---------------------------------------------------

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: please eat my data!
  2010-04-12 13:33 please eat my data! Sebastian Spaeth
@ 2010-04-12 14:21 ` Jameson Rollins
  2010-04-12 14:41   ` racin
  2010-04-12 15:24   ` Sebastian Spaeth
  0 siblings, 2 replies; 8+ messages in thread
From: Jameson Rollins @ 2010-04-12 14:21 UTC (permalink / raw
  To: Sebastian Spaeth, Notmuch list

[-- Attachment #1: Type: text/plain, Size: 833 bytes --]

On Mon, 12 Apr 2010 15:33:35 +0200, "Sebastian Spaeth" <Sebastian@SSpaeth.de> wrote:
> fsync is really killing xapian (and notmuch). What suffers, are the
> boolean prefixes (tag, id, and thread). Using libeatmydata (which
> disables fsync) shows a 10x speedup for tagging. The speedup is only
> factor 2 for e.g. from: searches. This is ext4 on recent stock
> Ubuntu. Given that search by tag and thread are performed really often
> (each time I advance a thread, for example), that really hurts.

Wow, this is really interesting, Sebastian.  For those of us not in the
know, can you explain what libeatmydata is and how it's used?  It sounds
like something I would *not* want to use!  So you didn't have to
recompile here, and only had to set LD_PRELOAD=./libeatmydata.so?  Is
there any drawback to what you're doing here?

jamie.

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: please eat my data!
  2010-04-12 14:21 ` Jameson Rollins
@ 2010-04-12 14:41   ` racin
  2010-04-12 15:24   ` Sebastian Spaeth
  1 sibling, 0 replies; 8+ messages in thread
From: racin @ 2010-04-12 14:41 UTC (permalink / raw
  To: Jameson Rollins; +Cc: Notmuch list


----- "Jameson Rollins" <jrollins@finestructure.net> a écrit :

> On Mon, 12 Apr 2010 15:33:35 +0200, "Sebastian Spaeth"
> <Sebastian@SSpaeth.de> wrote:
> > fsync is really killing xapian (and notmuch). What suffers, are the
> > boolean prefixes (tag, id, and thread). Using libeatmydata (which
> > disables fsync) shows a 10x speedup for tagging. The speedup is
> only
> > factor 2 for e.g. from: searches. This is ext4 on recent stock
> > Ubuntu. Given that search by tag and thread are performed really
> often
> > (each time I advance a thread, for example), that really hurts.
> 
> Wow, this is really interesting, Sebastian.  For those of us not in
> the
> know, can you explain what libeatmydata is and how it's used?  It
> sounds
> like something I would *not* want to use!  So you didn't have to
> recompile here, and only had to set LD_PRELOAD=./libeatmydata.so?  Is
> there any drawback to what you're doing here?
> 
> jamie.
> 

It seems like it is a small library that implements fsync as no-op. Using LD_PRELOAD 
allows to overloads the libc's fsync definition by libeatmydata's one. Making writes faster,
but no longer crash-safe.

Matthieu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: please eat my data!
  2010-04-12 14:21 ` Jameson Rollins
  2010-04-12 14:41   ` racin
@ 2010-04-12 15:24   ` Sebastian Spaeth
  2010-04-12 17:14     ` Stewart Smith
  1 sibling, 1 reply; 8+ messages in thread
From: Sebastian Spaeth @ 2010-04-12 15:24 UTC (permalink / raw
  To: Jameson Rollins, Notmuch list

On 2010-04-12, Jameson Rollins wrote:
> On Mon, 12 Apr 2010 15:33:35 +0200, "Sebastian Spaeth" > Wow, this is really interesting, Sebastian.  For those of us not in the
> know, can you explain what libeatmydata is and how it's used?

Hehe, I just got the pointer to it on IRC myself:

http://www.flamingspork.com/projects/libeatmydata/

You download and untar the thing, and "make" it, which produces
libeatmydata.so. Running a binary foo with LD_PRELOAD=./libeatmydata.so foo
will then effectively make all fsyncs a Noop. Not something you want on
your production systems, but great to test how much of a penality those
fsyncs really are.

What I find intersting is that we have a 2x speedup and a 10x speedup
for different queries. Olly was saying on IRC that both *should* really be
behaving in much the same manner.

Sebastian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: please eat my data!
  2010-04-12 15:24   ` Sebastian Spaeth
@ 2010-04-12 17:14     ` Stewart Smith
  2010-04-12 17:47       ` Dirk Hohndel
  0 siblings, 1 reply; 8+ messages in thread
From: Stewart Smith @ 2010-04-12 17:14 UTC (permalink / raw
  To: Sebastian Spaeth, Jameson Rollins, Notmuch list

On Mon, 12 Apr 2010 17:24:35 +0200, "Sebastian Spaeth" <Sebastian@SSpaeth.de> wrote:
> What I find intersting is that we have a 2x speedup and a 10x speedup
> for different queries. Olly was saying on IRC that both *should* really be
> behaving in much the same manner.

Remember that on ext3 (and pretty sure ext4) fsync is the same as
sync(). So performance depends on how much dirty data you have in your cache.

libeatmydata also gets rid of msync(), O_SYNC etc as well.

-- 
Stewart Smith

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: please eat my data!
  2010-04-12 17:14     ` Stewart Smith
@ 2010-04-12 17:47       ` Dirk Hohndel
  2010-04-12 23:10         ` Servilio Afre Puentes
  0 siblings, 1 reply; 8+ messages in thread
From: Dirk Hohndel @ 2010-04-12 17:47 UTC (permalink / raw
  To: Stewart Smith, Sebastian Spaeth, Jameson Rollins, Notmuch list

On Mon, 12 Apr 2010 10:14:05 -0700, Stewart Smith <stewart@flamingspork.com> wrote:
> On Mon, 12 Apr 2010 17:24:35 +0200, "Sebastian Spaeth" <Sebastian@SSpaeth.de> wrote:
> > What I find intersting is that we have a 2x speedup and a 10x speedup
> > for different queries. Olly was saying on IRC that both *should* really be
> > behaving in much the same manner.
> 
> Remember that on ext3 (and pretty sure ext4) fsync is the same as
> sync(). So performance depends on how much dirty data you have in your cache.
> 
> libeatmydata also gets rid of msync(), O_SYNC etc as well.

Which is why so many of us have started to use BTRFS...

Much smaller performance degradation when doing frequent fsync's

/D

-- 
Dirk Hohndel
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: please eat my data!
  2010-04-12 17:47       ` Dirk Hohndel
@ 2010-04-12 23:10         ` Servilio Afre Puentes
  2010-04-12 23:35           ` Dirk Hohndel
  0 siblings, 1 reply; 8+ messages in thread
From: Servilio Afre Puentes @ 2010-04-12 23:10 UTC (permalink / raw
  To: Dirk Hohndel; +Cc: Notmuch list

On 12 April 2010 13:47, Dirk Hohndel <hohndel@infradead.org> wrote:
> On Mon, 12 Apr 2010 10:14:05 -0700, Stewart Smith <stewart@flamingspork.com> wrote:
>> On Mon, 12 Apr 2010 17:24:35 +0200, "Sebastian Spaeth" <Sebastian@SSpaeth.de> wrote:
>> > What I find intersting is that we have a 2x speedup and a 10x speedup
>> > for different queries. Olly was saying on IRC that both *should* really be
>> > behaving in much the same manner.
>>
>> Remember that on ext3 (and pretty sure ext4) fsync is the same as
>> sync(). So performance depends on how much dirty data you have in your cache.
>>
>> libeatmydata also gets rid of msync(), O_SYNC etc as well.
>
> Which is why so many of us have started to use BTRFS...

How stable is it now? What kernel version and distro are you using?

Servilio

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: please eat my data!
  2010-04-12 23:10         ` Servilio Afre Puentes
@ 2010-04-12 23:35           ` Dirk Hohndel
  0 siblings, 0 replies; 8+ messages in thread
From: Dirk Hohndel @ 2010-04-12 23:35 UTC (permalink / raw
  To: Servilio Afre Puentes; +Cc: Notmuch list

On Mon, 12 Apr 2010 19:10:25 -0400, Servilio Afre Puentes <servilio@gmail.com> wrote:
> On 12 April 2010 13:47, Dirk Hohndel <hohndel@infradead.org> wrote:
> > On Mon, 12 Apr 2010 10:14:05 -0700, Stewart Smith <stewart@flamingspork.com> wrote:
> >> On Mon, 12 Apr 2010 17:24:35 +0200, "Sebastian Spaeth" <Sebastian@SSpaeth.de> wrote:
> >> > What I find intersting is that we have a 2x speedup and a 10x speedup
> >> > for different queries. Olly was saying on IRC that both *should* really be
> >> > behaving in much the same manner.
> >>
> >> Remember that on ext3 (and pretty sure ext4) fsync is the same as
> >> sync(). So performance depends on how much dirty data you have in your cache.
> >>
> >> libeatmydata also gets rid of msync(), O_SYNC etc as well.
> >
> > Which is why so many of us have started to use BTRFS...
> 
> How stable is it now? What kernel version and distro are you using?

Several. Fedora 12 with 2.6.34-rc3. Moblin-2.1 (derivative) with 2.6.33.
Debian sid with 2.6.33

I've been using it for most everything I do since some point in the
2.6.32 rcs.

/D

-- 
Dirk Hohndel
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-04-12 23:35 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-12 13:33 please eat my data! Sebastian Spaeth
2010-04-12 14:21 ` Jameson Rollins
2010-04-12 14:41   ` racin
2010-04-12 15:24   ` Sebastian Spaeth
2010-04-12 17:14     ` Stewart Smith
2010-04-12 17:47       ` Dirk Hohndel
2010-04-12 23:10         ` Servilio Afre Puentes
2010-04-12 23:35           ` Dirk Hohndel

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).