unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* Some Xapian tips and thoughts on rebuilding
@ 2010-01-10 17:43 Carl Worth
  2010-01-12  1:46 ` Kan-Ru Chen
  0 siblings, 1 reply; 6+ messages in thread
From: Carl Worth @ 2010-01-10 17:43 UTC (permalink / raw)
  To: notmuch

[-- Attachment #1: Type: text/plain, Size: 5293 bytes --]

With the recent change to "database format 1" some users might decide to
rebuild their notmuch database. If so, there are some things I've
learned about Xapian that are good to know before you rebuild. Or maybe
what you read below will encourage you to rebuild your notmuch database.

I think all users of notmuch have been discouraged by how slow it is to
change the tags on messages. Many of you have heard of "Xapian defect
#250" that was causing some slowness here. I'm happy to report that with
initial code from Kan-Ru Chen, Richard Boulton has recently committed a
fix for this bug to Xapian upstream, (after rewriting the fix
substantially, extending the fix to multiple backends, and writing
several new Xapian test cases for it).

However, just upgrading your Xapian library won't necessarily give you
any benefit with notmuch. But you can be assured of getting some benefit
if you upgrade both Xapian and notmuch and rebuild your notmuch
database. The gory details are covered below.

Gory details for getting the Xapian #250 fix benefit with flint
---------------------------------------------------------------
Xapian has a notion of multiple backends which store the data in the
database differently. In the 1.0 versions of Xapian, the default backend
is the "flint" backend. This backend stores the document "length" in
every "posting" entry, (where a posting is effectively a link from a
particular "term" to a particular "document" perhaps with positional
information).

The fix for defect #250 is to update as little as possible when we add
or remove a single term (and hence a posting) to a document. But if this
change also changes the document length, then all postings will
unavoidably need to be updated.

Historically, notmuch hasn't taken any special care with the results on
"document length" when adding terms for things like tags. The default
treatment is that terms *do* affect document length. But for terms like
tags that don't actually occur in the document content, it makes sense
to record them as having 0 effect on the document length. I recently
fixed notmuch to do so. But you'll have to rebuild your notmuch database
with a recent notmuch in order to get that change.

But if you rebuild, you might want to use chert instead of flint
----------------------------------------------------------------
I mentioned that "flint" is the default backend in the 1.0 releases of
Xapian. In the development versions that you can checkout from the
project's svn repository, there's support for a newer backend named
"chert", (expected to be the default in an upcoming release). To get
Xapian to use chert you need to have the following environment variable
set when doing the initial "notmuch new" to build your database:

	XAPIAN_PREFER_CHERT=1

After that, Xapian will see that your database is chert and will know
how to deal with it. (Except that I have seen that upgrading Xapian
From one svn version to another may result in incompatible changes to
the chert format---so a future Xapian may not be able to read a
previously-created chert database. I assume these format changes won't
happen in stable releases of Xapian.)

One thing that's nice about chert compared to flint is that it no longer
stores the document length in every posting. This means it's easier to
get the benefit from the Xapian defect #250 fix. It also means that your
database can be much smaller. For my notmuch database, a flint built is
about 7.0GB while a chert build is only 5.0GB---a very nice change.

Compacting your database
------------------------
One final tip. I recently started experimenting with a Xapian feature
for compacting a database. This is available only via a command-line
program, (named xapian-compact in the 1.0 releases and
xapian-compact-1.1 in the current Xapian from svn). This functionality
is not yet available in the Xapian library interface or else I would
probably make notmuch call it after building the database.

If you want to experiment with xapian-compact, you'll want to call it
with a command something like the following:

     xapian-compact-1.1 --no-renumber ~/mail/.notmuch/xapian ~/mail/.notmuch/xapian-compact

The --no-renumber argument is essential with a notmuch database, since
(as of database format version 1), notmuch stores Xapian document IDs
internally within terms. If you forget this, you'll find that all of
your searches will return results that are unable to locate any of the
filenames corresponding to your mail.

After running the above command, you could then move your existing
.notmuch/xapian away and move .notmuch/xapian-compact in its place to
test, and then discard the original .notmuch/xapian if you're happy with
the result.

For me, this compaction took my 5.0GB down to 3.1GB. So my database is
now less than half the size of what I started with with flint, (and can
conceivable be cached entirely within memory on my machine!), which is
quite delightful.

I hope the above is helpful, (and yes, clearly we need to get this
content out in other ways such as in a README in the source
distribution, and on the website in some form much better than our
current pipermail-based mailing-list archives).

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Some Xapian tips and thoughts on rebuilding
  2010-01-10 17:43 Some Xapian tips and thoughts on rebuilding Carl Worth
@ 2010-01-12  1:46 ` Kan-Ru Chen
  2010-01-12 20:48   ` Carl Worth
  0 siblings, 1 reply; 6+ messages in thread
From: Kan-Ru Chen @ 2010-01-12  1:46 UTC (permalink / raw)
  To: Carl Worth, notmuch

[-- Attachment #1: Type: text/plain, Size: 1925 bytes --]

On Sun, 10 Jan 2010 09:43:38 -0800, Carl Worth <cworth@cworth.org> wrote:
> Compacting your database
> ------------------------
> One final tip. I recently started experimenting with a Xapian feature
> for compacting a database. This is available only via a command-line
> program, (named xapian-compact in the 1.0 releases and
> xapian-compact-1.1 in the current Xapian from svn). This functionality
> is not yet available in the Xapian library interface or else I would
> probably make notmuch call it after building the database.
> 
> If you want to experiment with xapian-compact, you'll want to call it
> with a command something like the following:
> 
>      xapian-compact-1.1 --no-renumber ~/mail/.notmuch/xapian ~/mail/.notmuch/xapian-compact
> 
> The --no-renumber argument is essential with a notmuch database, since
> (as of database format version 1), notmuch stores Xapian document IDs
> internally within terms. If you forget this, you'll find that all of
> your searches will return results that are unable to locate any of the
> filenames corresponding to your mail.

After compacting my database, the size shrunk significantly, but the
number of messages also changed. Beware that you might lose messages
after compacting if you are trying this.

run on xapian-svn r13824

> 
> After running the above command, you could then move your existing
> .notmuch/xapian away and move .notmuch/xapian-compact in its place to
> test, and then discard the original .notmuch/xapian if you're happy with
> the result.
> 
> For me, this compaction took my 5.0GB down to 3.1GB. So my database is
> now less than half the size of what I started with with flint, (and can
> conceivable be cached entirely within memory on my machine!), which is
> quite delightful.
> 

-- 
Kan-Ru Chen | http://kanru.info

Q: Why are my replies five sentences or less?
A: http://five.sentenc.es/

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Some Xapian tips and thoughts on rebuilding
  2010-01-12  1:46 ` Kan-Ru Chen
@ 2010-01-12 20:48   ` Carl Worth
  2010-07-27 15:43     ` Kan-Ru Chen
  0 siblings, 1 reply; 6+ messages in thread
From: Carl Worth @ 2010-01-12 20:48 UTC (permalink / raw)
  To: Kan-Ru Chen, notmuch

[-- Attachment #1: Type: text/plain, Size: 453 bytes --]

On Tue, 12 Jan 2010 09:46:14 +0800, Kan-Ru Chen <kanru@kanru.info> wrote:
> After compacting my database, the size shrunk significantly, but the
> number of messages also changed. Beware that you might lose messages
> after compacting if you are trying this.
> 
> run on xapian-svn r13824

Yikes. That's very discouraging. I'll try to do some more testing to see
if I can replicate that, and if so, look closer into what is happening.

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Some Xapian tips and thoughts on rebuilding
  2010-01-12 20:48   ` Carl Worth
@ 2010-07-27 15:43     ` Kan-Ru Chen
  2010-07-28 12:23       ` Olly Betts
  0 siblings, 1 reply; 6+ messages in thread
From: Kan-Ru Chen @ 2010-07-27 15:43 UTC (permalink / raw)
  To: Carl Worth, notmuch

On Tue, 12 Jan 2010 12:48:27 -0800, Carl Worth <cworth@cworth.org> wrote:
> On Tue, 12 Jan 2010 09:46:14 +0800, Kan-Ru Chen <kanru@kanru.info> wrote:
> > After compacting my database, the size shrunk significantly, but the
> > number of messages also changed. Beware that you might lose messages
> > after compacting if you are trying this.
> > 
> > run on xapian-svn r13824
> 
> Yikes. That's very discouraging. I'll try to do some more testing to see
> if I can replicate that, and if so, look closer into what is happening.
> 
> -Carl

Seems this still does not work as expected.

With the new `notmuch count' command:

     % notmuch count
     131720

After xapian-compact-1.1

     % notmuch count
     1001

And a subsequent `notmuch new'

     % notmuch new
     Processed 6 total files in almost no time.                    
     Added 6 new messages to the database.

     % notmuch count
     A Xapian exception occurred: Value in posting list too large.
     Query string was: 
     0

- Kanru

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Some Xapian tips and thoughts on rebuilding
  2010-07-27 15:43     ` Kan-Ru Chen
@ 2010-07-28 12:23       ` Olly Betts
  2010-07-28 14:44         ` Kan-Ru Chen
  0 siblings, 1 reply; 6+ messages in thread
From: Olly Betts @ 2010-07-28 12:23 UTC (permalink / raw)
  To: notmuch

Kan-Ru Chen writes: 
> Seems this still does not work as expected.
> 
> With the new `notmuch count' command:
> 
>      % notmuch count
>      131720
> 
> After xapian-compact-1.1
> 
>      % notmuch count
>      1001
> 
> And a subsequent `notmuch new'
> 
>      % notmuch new
>      Processed 6 total files in almost no time.                    
>      Added 6 new messages to the database.
> 
>      % notmuch count
>      A Xapian exception occurred: Value in posting list too large.
>      Query string was: 
>      0

Bear in mind Xapian 1.1.x were development versions, and it sounds like you
are using a revision somewhere between 1.1.3 and 1.1.4.

If you can reproduce this with 1.2.x (or 1.0.x), I'm happy to investigate.
I'm not sure it's worthwhile to try to isolate a bug in an old development
version - a lot has changed in the last 6+ months.

Cheers,
    Olly

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Some Xapian tips and thoughts on rebuilding
  2010-07-28 12:23       ` Olly Betts
@ 2010-07-28 14:44         ` Kan-Ru Chen
  0 siblings, 0 replies; 6+ messages in thread
From: Kan-Ru Chen @ 2010-07-28 14:44 UTC (permalink / raw)
  To: Olly Betts, notmuch

On Wed, 28 Jul 2010 12:23:35 +0000 (UTC), Olly Betts <olly@survex.com> wrote:
> Bear in mind Xapian 1.1.x were development versions, and it sounds like you
> are using a revision somewhere between 1.1.3 and 1.1.4.
> 
> If you can reproduce this with 1.2.x (or 1.0.x), I'm happy to investigate.
> I'm not sure it's worthwhile to try to isolate a bug in an old development
> version - a lot has changed in the last 6+ months.

Oops, of course it failed. This xapian-compact-1.1 is the same one I
built 7 month ago!

Upgrade to Xapian 1.2.2 from debian experimental and test again, now my
.notmuch is almost half sized and it still works!

Cheers,
Kanru

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-07-28 14:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-10 17:43 Some Xapian tips and thoughts on rebuilding Carl Worth
2010-01-12  1:46 ` Kan-Ru Chen
2010-01-12 20:48   ` Carl Worth
2010-07-27 15:43     ` Kan-Ru Chen
2010-07-28 12:23       ` Olly Betts
2010-07-28 14:44         ` Kan-Ru Chen

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).