From: Eric Wong <e@yhbt.net>
To: meta@public-inbox.org
Subject: [PATCH] doc: add some more tuning notes
Date: Tue, 25 Aug 2020 10:51:20 +0000 [thread overview]
Message-ID: <20200825105120.30106-1-e@yhbt.net> (raw)
I've learned a thing or three about btrfs in the past few
weeks and remembered some old HDD things, too.
The Xapian MultiDatabase problem will need to be addressed
for 1.7...
---
Documentation/public-inbox-index.pod | 12 ++++++++++--
Documentation/public-inbox-init.pod | 15 +++++++++++----
Documentation/public-inbox-tuning.pod | 21 ++++++++++++++++++---
Documentation/public-inbox-xcpdb.pod | 1 +
4 files changed, 40 insertions(+), 9 deletions(-)
diff --git a/Documentation/public-inbox-index.pod b/Documentation/public-inbox-index.pod
index 46a53825..207b2ed8 100644
--- a/Documentation/public-inbox-index.pod
+++ b/Documentation/public-inbox-index.pod
@@ -39,8 +39,12 @@ normal search functionality.
Influences the number of Xapian indexing shards in a
(L<public-inbox-v2-format(5)>) inbox.
+See L<public-inbox-init(1)/--jobs> for a full description
+of sharding.
+
C<--jobs=0> is accepted as of public-inbox 1.6.0 (PENDING)
-to disable parallel indexing.
+to disable parallel indexing regardless of the number of
+pre-existing shards.
If the inbox has not been indexed or initialized, C<JOBS - 1>
shards will be created (one job is always needed for indexing
@@ -133,7 +137,11 @@ Available in public-inbox 1.6.0 (PENDING).
=item --no-fsync
Disables L<fsync(2)> and L<fdatasync(2)> operations on SQLite
-and Xapian. This is only effective with Xapian 1.4+.
+and Xapian. This is only effective with Xapian 1.4+. This is
+primarily intended for systems with low RAM and the small
+(default) C<--batch-size=1m>. Users of large C<--batch-size>
+may even find disabling L<fdatasync(2)> causes too much dirty
+data to accumulate, resulting on latency spikes from writeback.
Available in public-inbox 1.6.0 (PENDING).
diff --git a/Documentation/public-inbox-init.pod b/Documentation/public-inbox-init.pod
index b25dd1e4..24645045 100644
--- a/Documentation/public-inbox-init.pod
+++ b/Documentation/public-inbox-init.pod
@@ -86,14 +86,21 @@ Default: unset, no epochs are skipped
Control the number of Xapian index shards in a
C<-V2> (L<public-inbox-v2-format(5)>) inbox.
-It is useful to use a single shard (C<-j1>) for inboxes on
+It can be useful to use a single shard (C<-j1>) for inboxes on
high-latency storage (e.g. rotational HDD) unless the system has
enough RAM to cache 5-10x the size of the git repository.
-It is generally not useful to specify higher values than the
-default due to contention in the top-level producer process.
+Another approach for HDDs is to use the
+L<public-inbox-index(1)/publicInbox.indexSequentialShard> option
+and many shards, so each shard may fit into the kernel page
+cache. Unfortunately, excessive shards slows down read-only
+query performance.
-Default: the number of online CPUs, up to 4
+For fast storage, it is generally not useful to specify higher
+values than the default due to the top-level producer process
+being a bottleneck.
+
+Default: the number of online CPUs, up to 4 (3 shard workers, 1 producer)
=item --skip-docdata
diff --git a/Documentation/public-inbox-tuning.pod b/Documentation/public-inbox-tuning.pod
index abc53d1e..e3f2899b 100644
--- a/Documentation/public-inbox-tuning.pod
+++ b/Documentation/public-inbox-tuning.pod
@@ -69,7 +69,8 @@ footprint when indexing on HDDs.
Initializing a mirror with a high C<--jobs> count to create more
shards (in C<-V2> inboxes) will keep each shard smaller and
-reduce its kernel page cache footprint.
+reduce its kernel page cache footprint. Keep in mind excessive
+sharding imposes a performance penalty for read-only queries.
Users with large amounts of RAM are advised to set a large value
for C<publicinbox.indexBatchSize> as documented in
@@ -88,12 +89,21 @@ used by public-inbox are no exception to that.
public-inbox 1.6.0+ disables copy-on-write (CoW) on Xapian and SQLite
indices on btrfs to achieve acceptable performance (even on SSD).
-Disabling copy-on-write also disables checksumming, thus raid1
-(or higher) configurations may corrupt on unsafe shutdowns.
+Disabling copy-on-write also disables checksumming, thus C<raid1>
+(or higher) configurations may be corrupt after unsafe shutdowns.
Fortunately, these SQLite and Xapian indices are designed to
recoverable from git if missing.
+Disabling CoW does not prevent all fragmentation.
+
+Avoid snapshotting subvolumes containing Xapian and/or SQLite indices.
+Snapshots use CoW despite our efforts to disable it, resulting
+in fragmentation.
+
+L<filefrag(8)> can be used to monitor fragmentation, and
+C<btrfs filesystem defragment -fr $INBOX_DIR> may be necessary.
+
Large filesystems benefit significantly from the C<space_cache=v2>
mount option documented in L<btrfs(5)>.
@@ -106,6 +116,11 @@ While SSD read performance is generally good, SSD write performance
degrades as the drive ages and/or gets full. Issuing C<TRIM> commands
via L<fstrim(8)> or similar is required to sustain write performance.
+Users of the Flash-Friendly File System
+L<F2FS|https://en.wikipedia.org/wiki/F2FS> may benefit from
+optimizations found in SQLite 3.21.0+. Benchmarks are greatly
+appreciated.
+
=head2 Read-only daemons
L<public-inbox-httpd(1)>, L<public-inbox-imapd(1)>, and
diff --git a/Documentation/public-inbox-xcpdb.pod b/Documentation/public-inbox-xcpdb.pod
index 52939894..1397a7f4 100644
--- a/Documentation/public-inbox-xcpdb.pod
+++ b/Documentation/public-inbox-xcpdb.pod
@@ -60,6 +60,7 @@ used with C<--compact>.
=item --no-fsync
Disable L<fsync(2)> and L<fdatasync(2)>.
+See L<public-inbox-index(1)/--no-fsync> for caveats.
Available in public-inbox 1.6.0 (PENDING).
reply other threads:[~2020-08-25 10:51 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200825105120.30106-1-e@yhbt.net \
--to=e@yhbt.net \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).