From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id D26281F8C6; Thu, 29 Jul 2021 21:13:21 +0000 (UTC) Date: Thu, 29 Jul 2021 21:13:21 +0000 From: Eric Wong To: Konstantin Ryabitsev Cc: meta@public-inbox.org Subject: Re: --batch-size and --jobs combination Message-ID: <20210729211321.GA23521@dcvr> References: <20210729202836.7qdwxojjel6jmxh6@nitro.local> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20210729202836.7qdwxojjel6jmxh6@nitro.local> List-Id: Konstantin Ryabitsev wrote: > Hello: > > Is there any specific logic for mixing --batch-size and --jobs? On a system > with plenty of CPUs and lots of RAM, does it make sense to have more --jobs, > larger --batch-size, or some balance of both? jobs will be bound by I/O capability for your case. SATA-2 vs SATA-3 vs NVME will have a notable difference, as does the quality of the device (MLC, TLC, QLC; cache/controller). Xapian seems to do better with bigger batch-sizes up to a point. I'm not sure I have enough RAM to accurately test >8m batch sizes (since we also need to account for kernel caching). batch-size * (jobs - 1) = rough total batch size If it's the initial index creation, I would definitely use --no-fsync, too. Perhaps that should be the default for new indices. Also note: the recent RFC for --sequential-commit doesn't seem to be working out performance-wise on my SATA-2 system; but I'm also not sure about SSD life/degradation.