From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id 08E736DE0EDB for ; Thu, 2 Jan 2020 14:46:15 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: 0 X-Spam-Level: X-Spam-Status: No, score=0 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_NONE=-0.0001] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oqGm6ka6kRRC for ; Thu, 2 Jan 2020 14:46:13 -0800 (PST) Received: from h2.fbrelay.privateemail.com (h2.fbrelay.privateemail.com [131.153.2.43]) by arlo.cworth.org (Postfix) with ESMTPS id 8E16A6DE0ED6 for ; Thu, 2 Jan 2020 14:46:13 -0800 (PST) Received: from MTA-09-4.privateemail.com (mta-09.privateemail.com [198.54.127.58]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by h1.fbrelay.privateemail.com (Postfix) with ESMTPS id E99B4808B2 for ; Thu, 2 Jan 2020 17:46:11 -0500 (EST) Received: from MTA-09.privateemail.com (localhost [127.0.0.1]) by MTA-09.privateemail.com (Postfix) with ESMTP id EA9B560041; Thu, 2 Jan 2020 17:46:06 -0500 (EST) Received: from [10.15.0.5] (unknown [10.20.151.235]) by MTA-09.privateemail.com (Postfix) with ESMTPA id 7D16660034; Thu, 2 Jan 2020 22:46:06 +0000 (UTC) Subject: Re: Xapian commits unexpectedly slow To: David Bremner , notmuch@notmuchmail.org References: <4b3b642b-8f5b-4e8c-9f29-76d393d45fd6@e10x.net> <87tv5kvv66.fsf@tethera.net> From: Matthew Schauer Message-ID: <6a695b68-6f21-2125-d087-55816e07fffa@e10x.net> Date: Thu, 2 Jan 2020 15:46:04 -0700 User-Agent: Mozilla/5.0 MIME-Version: 1.0 In-Reply-To: <87tv5kvv66.fsf@tethera.net> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV using ClamSMTP X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Jan 2020 22:46:15 -0000 On 12/28/19 5:21 PM, David Bremner wrote: > Matthew Schauer writes: > >> Greetings, >> >> I've been trying to migrate about 25K e-mails to Notmuch, and I'm seeing >> some frustrating performance characteristics that don't seem to match >> with the experience others report. > > 25,000 messages should really not be a strain, spinning rust or no. > >> I'm dumping messages from >> Thunderbird in batches and then running `notmuch new` to add each batch >> to the database. The indexing performance remains okay, at more than >> 200 per second, but after Notmuch has reported it's finished indexing, >> it hangs for as much as several minutes before exiting. A stack trace >> confirms that this is Xapian committing the database, with most of the >> time seemingly spent in `fdatasync`. The time spent grows with the size >> of the database, not the number of e-mails being imported, which means >> this will remain a problem during day-to-day usage. > > It would be interesting if you could report the results of running the > notmuch performance test suite (under performance-test/ in the source). Nifty! Here are the results -- I assume you know how to interpret them better than I do: T00-new.sh: Testing notmuch new [0.4 large] Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B) Initial notmuch new 1163.05 854.26 45.97 444304 2343120/13645200 notmuch new #2 2.23 0.02 0.03 9384 2144/8 notmuch new #3 0.01 0.01 0.00 9460 0/8 notmuch new #4 0.01 0.01 0.00 9428 0/8 notmuch new #5 0.01 0.00 0.00 9468 0/8 notmuch new #6 0.01 0.01 0.00 9692 0/8 new (52374 mv) 1351.01 537.75 235.45 959524 1027288/8531616 new (52374 mv back) 834.15 489.27 213.97 967040 184/4754016 new (52374 cp) 747.23 284.03 105.51 941992 0/4007120 T01-dump-restore.sh: Testing dump and restore [0.4 large] Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B) load nmbug tags 32.64 4.16 3.97 12744 776/38968 dump * 5.02 4.81 0.18 26256 8/27928 restore * 5.94 5.43 0.48 9728 0/0 T02-tag.sh: Testing tagging [0.4 large] Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B) tag * +new_tag 611.53 305.67 229.54 111304 0/1840208 tag * +existing_tag 0.05 0.01 0.00 9340 96/0 tag * -existing_tag 513.58 242.77 215.88 36252 0/1937792 tag * -missing_tag 0.02 0.00 0.01 9332 0/0 T03-reindex.sh: Testing reindexing [0.4 large] Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B) reindex * 1893.02 590.11 150.22 392180 7572912/4620792 reindex * 853.85 440.58 115.60 337320 3072648/2512376 reindex * 629.36 415.50 107.50 337188 1501448/2507848 T04-thread-subquery.sh: Testing thread subqueries [0.4 large] Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B) search thread:{} ... 28.38 8.25 1.49 94304 278064/144 search thread:{} ... 11.25 5.26 0.63 94300 81520/144 search thread:{} ... 3.24 2.94 0.29 94284 0/144 > The other thing I'm curious about is the actual size of the > database. This varies a lot, but in the past pathological performance > has sometimes been linked to indexing things that should not be, > bloating the database. Here are some relevant lines from my import process, showing how long Notmuch thinks it's taking, how long it's actually taking, and the size of the database after each import. The sizes seem reasonable to me. For comparison, the maildir itself is 4.5 GiB after all of the imports. Processed 1057 total files in 1s (759 files/sec.). notmuch new 1.48s user 0.08s system 52% cpu 2.982 total 6.0M .notmuch Processed 1669 total files in 3s (438 files/sec.). notmuch new 3.95s user 0.19s system 63% cpu 6.516 total 27M .notmuch Processed 3338 total files in 9s (359 files/sec.). notmuch new 9.73s user 0.44s system 71% cpu 14.288 total 71M .notmuch Processed 7547 total files in 24s (304 files/sec.). notmuch new 23.82s user 0.97s system 83% cpu 29.521 total 167M .notmuch Processed 8224 total files in 39s (210 files/sec.). notmuch new 35.72s user 1.96s system 52% cpu 1:12.08 total 334M .notmuch Processed 9530 total files in 39s (239 files/sec.). notmuch new 35.10s user 1.88s system 74% cpu 49.630 total 519M .notmuch Processed 6029 total files in 46s (129 files/sec.). notmuch new 26.42s user 1.88s system 24% cpu 1:54.35 total 641M .notmuch Processed 6387 total files in 38s (167 files/sec.). notmuch new 24.29s user 1.69s system 13% cpu 3:10.35 total 706M .notmuch Processed 3113 total files in 10s (308 files/sec.). notmuch new 10.65s user 0.82s system 6% cpu 2:53.25 total 725M .notmuch Processed 410 total files in 1s (344 files/sec.). notmuch new 1.21s user 0.32s system 1% cpu 1:57.68 total 726M .notmuch Processed 317 total files in almost no time. notmuch new 1.09s user 0.34s system 1% cpu 2:22.09 total 727M .notmuch