unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* "notmuch compact" questions
@ 2023-06-17 21:33 Andy Smith
  2023-06-18  6:57 ` David Bremner
  2023-06-18 11:36 ` Michael J Gruber
  0 siblings, 2 replies; 5+ messages in thread
From: Andy Smith @ 2023-06-17 21:33 UTC (permalink / raw)
  To: notmuch

Hi,

I'm using v0.31.4 on Debian 11. I have ~3.9 million messages in my
archive and the notmuch database currently takes up 85GiB (though
actually "only" 51GiB due to btrfs zstd:1 compression).

I did remove a few hundred thousand messages from my archive but the
space used by the database did not go down at all.

If I ran "notmuch compact" should I expect any space to be
reclaimed?

The manual page says that this builds a new copy of the database and
then switches them over. Does that imply that I will need nearly the
same amount of space again to perform the compact, until it finishes
and the old database is discarded?

The manual page says that the new database is built "in a temporary
directory". Where is that directory exactly? Is it inside the
current notmuch database directory or is it in $TMPDIR? I ask
because it looks like I'll need to make sure that there about 50GiB
of space available wherever that is.

I'm aware that this procedure is going to take a really really long
time. If my machine should crash, or the notmuch process runs out of
memory or something, will my database be left in a functional state?
If I have to reindex it, that is going to take even longer, so I
have to think about how much I want what is probably a very marginal
amount of space back!

Thanks,
Andy

-- 
https://bitfolk.com/ -- No-nonsense VPS hosting

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: "notmuch compact" questions
  2023-06-17 21:33 "notmuch compact" questions Andy Smith
@ 2023-06-18  6:57 ` David Bremner
  2023-06-18 11:36 ` Michael J Gruber
  1 sibling, 0 replies; 5+ messages in thread
From: David Bremner @ 2023-06-18  6:57 UTC (permalink / raw)
  To: Andy Smith, notmuch

Andy Smith <andy@strugglers.net> writes:

> If I ran "notmuch compact" should I expect any space to be
> reclaimed?

Yes, if the database was built incrementally, you can expect at least
20% savings, in some cases more than 50%. If you just ran notmuch new
with an empty database, there would be less savings (but still some).  I
vaguely remember there is potentially some minor slowdown from the
compacted database, but no-one has complained that is objectionable.

> The manual page says that this builds a new copy of the database and
> then switches them over. Does that imply that I will need nearly the
> same amount of space again to perform the compact, until it finishes
> and the old database is discarded?

Yes.

> The manual page says that the new database is built "in a temporary
> directory". Where is that directory exactly? Is it inside the
> current notmuch database directory or is it in $TMPDIR? I ask
> because it looks like I'll need to make sure that there about 50GiB
> of space available wherever that is.

iirc, in the notmuch database directory; see below

> I'm aware that this procedure is going to take a really really long
> time. If my machine should crash, or the notmuch process runs out of
> memory or something, will my database be left in a functional state?

Yes, the compaction is atomic, within the limits of rename on your file
system (that is why it has to be in the same directory as your existing
database). The original database will be there if something crashes.

> If I have to reindex it, that is going to take even longer, so I
> have to think about how much I want what is probably a very marginal
> amount of space back!

Compaction should be substantially faster than re-indexing.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: "notmuch compact" questions
  2023-06-17 21:33 "notmuch compact" questions Andy Smith
  2023-06-18  6:57 ` David Bremner
@ 2023-06-18 11:36 ` Michael J Gruber
  2023-06-18 14:36   ` Andy Smith
  1 sibling, 1 reply; 5+ messages in thread
From: Michael J Gruber @ 2023-06-18 11:36 UTC (permalink / raw)
  To: Andy Smith; +Cc: notmuch

Am Sa., 17. Juni 2023 um 23:49 Uhr schrieb Andy Smith <andy@strugglers.net>:
>
> Hi,
>
> I'm using v0.31.4 on Debian 11. I have ~3.9 million messages in my
> archive and the notmuch database currently takes up 85GiB (though
> actually "only" 51GiB due to btrfs zstd:1 compression).

Wow ;)

> I did remove a few hundred thousand messages from my archive but the
> space used by the database did not go down at all.
>
> If I ran "notmuch compact" should I expect any space to be
> reclaimed?
>
> The manual page says that this builds a new copy of the database and
> then switches them over. Does that imply that I will need nearly the
> same amount of space again to perform the compact, until it finishes
> and the old database is discarded?

Yes.

> The manual page says that the new database is built "in a temporary
> directory". Where is that directory exactly? Is it inside the
> current notmuch database directory or is it in $TMPDIR? I ask

Your notmuch db has a xpian subdir, by default named `xapian`. The
compact is done in a new dir `xapian.compact` which is a sibling dir
to `xapian`.

> because it looks like I'll need to make sure that there about 50GiB
> of space available wherever that is.
>
> I'm aware that this procedure is going to take a really really long
> time. If my machine should crash, or the notmuch process runs out of
> memory or something, will my database be left in a functional state?

You can tell `notmuch compact` to move the existing db to a backup dir
(on the same file system) after a successful compaction.

So, if you don't want to take the risk of `notmuch compact` wrongly
considering the process to be successful, just specify a backup dir.
Either the compaction fails, in which case the original db is
untouched. (In case of a hard crash a file lock might be stale, but
that should be it.) Or it succeeds and the old db is in the backup
dir.

Disclaimer: Back up manually to a different fs before ... or just take
a btrfs snapshot.

> If I have to reindex it, that is going to take even longer, so I
> have to think about how much I want what is probably a very marginal
> amount of space back!

On my comparatively small db, gains are typically substantial
(postlist, position), even though I compact from time to time.

Expect 4 tables to be compactified (if you want to start a trial run, say).

Cheers
Michael

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: "notmuch compact" questions
  2023-06-18 11:36 ` Michael J Gruber
@ 2023-06-18 14:36   ` Andy Smith
  2023-06-19  6:51     ` Carl Worth
  0 siblings, 1 reply; 5+ messages in thread
From: Andy Smith @ 2023-06-18 14:36 UTC (permalink / raw)
  To: notmuch

Hi Michael,

On Sun, Jun 18, 2023 at 01:36:30PM +0200, Michael J Gruber wrote:
> On my comparatively small db, gains are typically substantial
> (postlist, position), even though I compact from time to time.

Thanks for your help. The savings were indeed a lot more than I
expected (and the compact didn't take anywhere near as long as I
thought it would, either):

$ sudo compsize ~/.notmuch
Processed 7 files, 5159972 regular extents (5484004 refs), 2 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL       60%       51G          85G          56G       
none       100%       18G          18G          16G       
zstd        49%       32G          66G          39G
$ notmuch compact
Compacting database...
compacting table postlist
     Reduced by 62% 8838736K (14138192K -> 5299456K)
compacting table docdata
     Reduced by 50% 1168K (2304K -> 1136K)
compacting table termlist
     Reduced by 51% 6816008K (13112728K -> 6296720K)
compacting table position
     Reduced by 54% 17223112K (31731384K -> 14508272K)
compacting table spelling
     doesn't exist
compacting table synonym
     doesn't exist
Done.
$ sudo compsize ~/.notmuch
Processed 7 files, 186246 regular extents (187018 refs), 2 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL       56%       14G          25G          25G       
none       100%      3.7G         3.7G         3.7G       
zstd        48%       10G          21G          21G

I don't know why so much space should have been saved, though it may
be something to do with the fact that I indexed everything and then
decided to move a million or so messages around into a different
folder structure.

Thanks,
Andy

-- 
https://bitfolk.com/ -- No-nonsense VPS hosting

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: "notmuch compact" questions
  2023-06-18 14:36   ` Andy Smith
@ 2023-06-19  6:51     ` Carl Worth
  0 siblings, 0 replies; 5+ messages in thread
From: Carl Worth @ 2023-06-19  6:51 UTC (permalink / raw)
  To: Andy Smith, notmuch

Hi Andy,

I'm really glad the compaction was so successful!

I think the Xapian folks deserve the big credit for this, more than any
code specific to notmuch, (other than user interface and safety
checking/backup, etc.).

I'm really glad you're finding notmuch helpful. And I'm glad you felt
comfortable reaching out for guidance.

Enjoy your new hard drive space!

-Carl

On Sun, Jun 18 2023, Andy Smith wrote:
> Hi Michael,
>
> On Sun, Jun 18, 2023 at 01:36:30PM +0200, Michael J Gruber wrote:
>> On my comparatively small db, gains are typically substantial
>> (postlist, position), even though I compact from time to time.
>
> Thanks for your help. The savings were indeed a lot more than I
> expected (and the compact didn't take anywhere near as long as I
> thought it would, either):
>
> $ sudo compsize ~/.notmuch
> Processed 7 files, 5159972 regular extents (5484004 refs), 2 inline.
> Type       Perc     Disk Usage   Uncompressed Referenced  
> TOTAL       60%       51G          85G          56G       
> none       100%       18G          18G          16G       
> zstd        49%       32G          66G          39G
> $ notmuch compact
> Compacting database...
> compacting table postlist
>      Reduced by 62% 8838736K (14138192K -> 5299456K)
> compacting table docdata
>      Reduced by 50% 1168K (2304K -> 1136K)
> compacting table termlist
>      Reduced by 51% 6816008K (13112728K -> 6296720K)
> compacting table position
>      Reduced by 54% 17223112K (31731384K -> 14508272K)
> compacting table spelling
>      doesn't exist
> compacting table synonym
>      doesn't exist
> Done.
> $ sudo compsize ~/.notmuch
> Processed 7 files, 186246 regular extents (187018 refs), 2 inline.
> Type       Perc     Disk Usage   Uncompressed Referenced  
> TOTAL       56%       14G          25G          25G       
> none       100%      3.7G         3.7G         3.7G       
> zstd        48%       10G          21G          21G
>
> I don't know why so much space should have been saved, though it may
> be something to do with the fact that I indexed everything and then
> decided to move a million or so messages around into a different
> folder structure.
>
> Thanks,
> Andy
>
> -- 
> https://bitfolk.com/ -- No-nonsense VPS hosting
> _______________________________________________
> notmuch mailing list -- notmuch@notmuchmail.org
> To unsubscribe send an email to notmuch-leave@notmuchmail.org

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-06-21  6:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-17 21:33 "notmuch compact" questions Andy Smith
2023-06-18  6:57 ` David Bremner
2023-06-18 11:36 ` Michael J Gruber
2023-06-18 14:36   ` Andy Smith
2023-06-19  6:51     ` Carl Worth

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).