From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id Ze5nKj1LqV4vWAAA0tVLHw (envelope-from ) for ; Wed, 29 Apr 2020 09:39:09 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id YAvYOUVLqV4yVQAA1q6Kng (envelope-from ) for ; Wed, 29 Apr 2020 09:39:17 +0000 Received: from arlo.cworth.org (arlo.cworth.org [50.126.95.6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id ACC5B9428D6 for ; Wed, 29 Apr 2020 09:39:15 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id A9D416DE1028; Wed, 29 Apr 2020 02:39:09 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7uA5zG70fAiQ; Wed, 29 Apr 2020 02:39:09 -0700 (PDT) Received: from arlo.cworth.org (localhost [IPv6:::1]) by arlo.cworth.org (Postfix) with ESMTP id 8C0E46DE0F8A; Wed, 29 Apr 2020 02:39:08 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id D40B26DE0F8A for ; Wed, 29 Apr 2020 02:39:06 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dxe13O7usZd8 for ; Wed, 29 Apr 2020 02:39:06 -0700 (PDT) X-Greylist: delayed 497 seconds by postgrey-1.36 at arlo; Wed, 29 Apr 2020 02:39:05 PDT Received: from dcvr.yhbt.net (dcvr.yhbt.net [64.71.152.64]) by arlo.cworth.org (Postfix) with ESMTPS id EF9D46DE0F84 for ; Wed, 29 Apr 2020 02:39:05 -0700 (PDT) Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id A45041F9E0; Wed, 29 Apr 2020 09:30:46 +0000 (UTC) Date: Wed, 29 Apr 2020 09:30:46 +0000 From: Eric Wong To: Franz Fellner , Don Zickus Subject: Re: performance problems with notmuch new Message-ID: <20200429093046.GA11038@dcvr> References: <20200415150801.h2mazyo37sspvech@redhat.com> <1587211167-ner-6.432@LappyL520> <87imhup6kr.fsf@tethera.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <87imhup6kr.fsf@tethera.net> X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: notmuch@notmuchmail.org, xapian-discuss@lists.xapian.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: notmuch-bounces@notmuchmail.org Sender: "notmuch" X-Scanner: scn0 X-Spam-Score: 0.99 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of notmuch-bounces@notmuchmail.org designates 50.126.95.6 as permitted sender) smtp.mailfrom=notmuch-bounces@notmuchmail.org X-Scan-Result: default: False [0.99 / 13.00]; FORGED_SENDER_MAILLIST(0.00)[]; GENERIC_REPUTATION(0.00)[-0.46399339838524]; TO_DN_SOME(0.00)[]; IP_REPUTATION_HAM(0.00)[asn: 27017(-0.18), country: US(-0.00), ip: 50.126.95.6(-0.46)]; R_SPF_ALLOW(-0.20)[+a]; DWL_DNSWL_FAIL(0.00)[50.126.95.6:server fail]; MX_GOOD(-0.50)[cached: notmuchmail.org]; MAILLIST(-0.20)[mailman]; FREEMAIL_TO(0.00)[gmail.com,redhat.com]; RCVD_TLS_LAST(0.00)[]; R_DKIM_NA(0.00)[]; ASN(0.00)[asn:27017, ipnet:50.126.64.0/18, country:US]; MIME_TRACE(0.00)[0:+]; FROM_NEQ_ENVFROM(0.00)[e@80x24.org,notmuch-bounces@notmuchmail.org]; ARC_NA(0.00)[]; URIBL_BLOCKED(0.00)[flamingspork.com:url,tethera.net:email,notmuchmail.org:email]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; SPF_REPUTATION_HAM(0.00)[-0.45917164989253]; TAGGED_RCPT(0.00)[]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[notmuch@notmuchmail.org]; HAS_LIST_UNSUB(-0.01)[]; DNSWL_BLOCKED(0.00)[50.126.95.6:from]; DMARC_NA(0.00)[80x24.org: no valid DMARC record]; MID_RHS_NOT_FQDN(0.50)[]; RCVD_COUNT_SEVEN(0.00)[8]; SUSPICIOUS_RECIPS(1.50)[] X-TUID: L1Q4S3+qyVrg David Bremner wrote: > Franz Fellner writes: > > mail takes at least 10 seconds, sometimes even more. It can go into > > minutes when I get lots of mail (~30...). When I run it after a > > reboot I can have breakfast while notmuch starts up... This is all on > > spinning rust. I thought of getting an SSD but not in the near future. > > I do have at least one spinning rust configuration with about 300k > messages, and notmuch new is still fast there. I've yet to figure out how spinning rust can work well with giant public-inboxes (git + Xapian + SQLite); but I have a fair bit of experience with SSDs + Xapian. But some of my recommendations below come from my experience with HDDs in the old days, before I used Xapian. > > What I observe during that time: notmuch doesn't really need much CPU. > > iotop shows constant read and write with extremely low rates, under > > 1MB/sec. So I think it might be an issue in xapian? Seek times, probably `iostat -x 1' can give you very useful information about I/O queue sizes and wait times for reads and writes (the `-x' is the good stuff :), `1' means it keeps outputting every second. > Just in case one of the xapian experts can suggest some kind of test for > why you might be seeing this behaviour, I've included the xapian list in > CC. Newer Xapian has a DB_NO_SYNC which notmuch could set as an option. Users of old Xapian (or on Perl XS bindings) also have libeatmydata LD_PRELOAD which I end up using all the time: https://www.flamingspork.com/projects/libeatmydata/ I run `sync' if I have anything important, but I usually don't ;) I do set the kernel do flush dirty data in the background fairly aggressively, though (more below) For public-inbox v2 hacking in 2018 (indexing LKML archives, ~3M messages), I found working on a freshly TRIM-ed SSD with plenty of free space made the SSD firmware happier. SSDs can get a LOT slower as they get fuller (so xapian-compact helps, there, too). SSD quality matters a lot; but even the low-end QLC stuff beats high-end HDDs in random I/O; but they will slow down more as they fill up more. For writes, I set /proc/sys/vm/dirty_background_bytes to 100M or something reasonably close to what the SSD can write quickly. Linux tended to hit I/O stalls with lots of dirty data, so making the kernel flush it sooner tends to help IME. Maybe newer kernels do better *shrug*; but it's basically the local storage version of the network "Bufferbloat" problem. Flushing dirty data more frequently also frees up more memory for the kernel to make better caching decisions about future/current data it needs to read. notmuch can probably run a background thread (or use liburing) to do POSIX_FADV_DONTNEED once its done with a message, too (and POSIX_FADV_WILLNEED for to-be-indexed messages). Uncompressed Maildir messages eat cache space real quick, which means less cache for Xapian. public-inbox indexes the v2 inbox format in parallel; but excessive parallelism still causes I/O contention with SSDs (at least upper-mid-range ones). So right now the default limit is 3 indexing processes regardless of CPU count. Reading from git is still synchronous atm, but will probably be async in a few months. git itself tends to generate decent I/O patterns with its pack format (but makes posix_fadvise hinting impractical). Anyways, indexing just under 3 million LKML messages took ~4 hours on 4-core system built in 2010 with a SATA SSD from 2014.