From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id A15071F66E for ; Sat, 15 Aug 2020 05:21:02 +0000 (UTC) From: Eric Wong To: meta@public-inbox.org Subject: [PATCH] doc: add public-inbox-tuning(7) manpage Date: Sat, 15 Aug 2020 05:21:02 +0000 Message-Id: <20200815052102.4178-1-e@yhbt.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: Determining storage device speed and latencies doesn't seem portable or even possible with the wide variety of storage layers in use. This means we need to write a tuning document and hope users read and improve on it :P --- Documentation/public-inbox-tuning.pod | 139 +++++++++++++++++++++++ Documentation/public-inbox-v2-format.pod | 6 +- MANIFEST | 1 + Makefile.PL | 2 +- 4 files changed, 144 insertions(+), 4 deletions(-) create mode 100644 Documentation/public-inbox-tuning.pod diff --git a/Documentation/public-inbox-tuning.pod b/Documentation/public-inbox-tuning.pod new file mode 100644 index 00000000..abc53d1e --- /dev/null +++ b/Documentation/public-inbox-tuning.pod @@ -0,0 +1,139 @@ +=head1 NAME + +public-inbox-tuning - tuning public-inbox + +=head1 DESCRIPTION + +public-inbox intends to support a wide variety of hardware. While +we strive to provide the best out-of-the-box performance possible, +tuning knobs are an unfortunate necessity in some cases. + +=over 4 + +=item 1 + +New inboxes: public-inbox-init -V2 + +=item 2 + +Process spawning + +=item 3 + +Performance on rotational hard disk drives + +=item 4 + +Btrfs (and possibly other copy-on-write filesystems) + +=item 5 + +Performance on solid state drives + +=item 6 + +Read-only daemons + +=back + +=head2 New inboxes: public-inbox-init -V2 + +If you're starting a new inbox (and not mirroring an existing one), +the L<-V2|public-inbox-v2-format(5)> requires L, but is +orders of magnitude more scalable than the original C<-V1> format. + +=head2 Process spawning + +Our optional use of L speeds up subprocess spawning from +large daemon processes. + +To enable L, either set the C +environment variable to point to a writable directory, or create +C<~/.cache/public-inbox/inline-c> for any user(s) running +public-inbox processes. + +More (optional) L use will be introduced in the future +to lower memory use and improve scalability. + +=head2 Performance on rotational hard disk drives + +Random I/O performance is poor on rotational HDDs. Xapian indexing +performance degrades significantly as DBs grow larger than available +RAM. Attempts to parallelize random I/O on HDDs leads to pathological +slowdowns as inboxes grow. + +While C<-V2> introduced Xapian shards as a parallelization +mechanism for SSDs; enabling C +repurposes sharding as mechanism to reduce the kernel page cache +footprint when indexing on HDDs. + +Initializing a mirror with a high C<--jobs> count to create more +shards (in C<-V2> inboxes) will keep each shard smaller and +reduce its kernel page cache footprint. + +Users with large amounts of RAM are advised to set a large value +for C as documented in +L. + +C users on Linux 4.0+ are advised to try the +C<--perf-same_cpu_crypt> C<--perf-submit_from_crypt_cpus> +switches of L to reduce I/O contention from +kernel workqueue threads. + +=head2 Btrfs (and possibly other copy-on-write filesystems) + +L performance degrades from fragmentation when using +large databases and random writes. The Xapian + SQLite indices +used by public-inbox are no exception to that. + +public-inbox 1.6.0+ disables copy-on-write (CoW) on Xapian and SQLite +indices on btrfs to achieve acceptable performance (even on SSD). +Disabling copy-on-write also disables checksumming, thus raid1 +(or higher) configurations may corrupt on unsafe shutdowns. + +Fortunately, these SQLite and Xapian indices are designed to +recoverable from git if missing. + +Large filesystems benefit significantly from the C +mount option documented in L. + +Older, non-CoW filesystems are generally work well out-of-the-box +for our Xapian and SQLite indices. + +=head2 Performance on solid state drives + +While SSD read performance is generally good, SSD write performance +degrades as the drive ages and/or gets full. Issuing C commands +via L or similar is required to sustain write performance. + +=head2 Read-only daemons + +L, L, and +L are all designed for C10K (or higher) +levels of concurrency from a single process. SMP systems may +use C<--worker-processes=NUM> as documented in L +for parallelism. + +The open file descriptor limit (C, C in L, +C in L) may need to be raised to +accomodate many concurrent clients. + +Transport Layer Security (IMAPS, NNTPS, or via STARTTLS) significantly +increases memory use of client sockets, sure to account for that in +capacity planning. + +=head1 CONTACT + +Feedback encouraged via plain-text mail to L + +Information for *BSDs and non-traditional filesystems especially +welcome. + +Our archives are hosted at L, +L, and other places + +=head1 COPYRIGHT + +Copyright 2020 all contributors L + +License: AGPL-3.0+ L diff --git a/Documentation/public-inbox-v2-format.pod b/Documentation/public-inbox-v2-format.pod index 6876989c..86a9b8f2 100644 --- a/Documentation/public-inbox-v2-format.pod +++ b/Documentation/public-inbox-v2-format.pod @@ -117,9 +117,9 @@ Rotational storage devices perform significantly worse than solid state storage for indexing of large mail archives; but are fine for backup and usable for small instances. -As of public-inbox 1.6.0, the C<--sequential-shard> option of -L may be used with a high shard count -to ensure individual shards fit into page cache when the entire +As of public-inbox 1.6.0, the C +option of L may be used with a high shard +count to ensure individual shards fit into page cache when the entire Xapian DB cannot. Our use of the L requires Xapian document IDs to diff --git a/MANIFEST b/MANIFEST index 3d690177..6cb5f6bf 100644 --- a/MANIFEST +++ b/MANIFEST @@ -35,6 +35,7 @@ Documentation/public-inbox-mda.pod Documentation/public-inbox-nntpd.pod Documentation/public-inbox-overview.pod Documentation/public-inbox-purge.pod +Documentation/public-inbox-tuning.pod Documentation/public-inbox-v1-format.pod Documentation/public-inbox-v2-format.pod Documentation/public-inbox-watch.pod diff --git a/Makefile.PL b/Makefile.PL index 831649f9..88da5b45 100644 --- a/Makefile.PL +++ b/Makefile.PL @@ -34,7 +34,7 @@ $v->{my_syntax} = [map { "$_.syntax" } @syn]; $v->{-m1} = [ map { (split('/'))[-1] } @EXE_FILES ]; $v->{-m5} = [ qw(public-inbox-config public-inbox-v1-format public-inbox-v2-format) ]; -$v->{-m7} = [ qw(public-inbox-overview) ]; +$v->{-m7} = [ qw(public-inbox-overview public-inbox-tuning) ]; $v->{-m8} = [ qw(public-inbox-daemon) ]; my @sections = (1, 5, 7, 8); $v->{check_80} = [];