From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 15A6C431FB6 for ; Mon, 25 Jun 2012 10:59:22 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cKxtNjxO4jYF for ; Mon, 25 Jun 2012 10:59:20 -0700 (PDT) Received: from dmz-mailsec-scanner-2.mit.edu (DMZ-MAILSEC-SCANNER-2.MIT.EDU [18.9.25.13]) by olra.theworths.org (Postfix) with ESMTP id E24BA431FAF for ; Mon, 25 Jun 2012 10:59:19 -0700 (PDT) X-AuditID: 1209190d-b7fd56d000000933-cd-4fe8a6f776df Received: from mailhub-auth-1.mit.edu ( [18.9.21.35]) by dmz-mailsec-scanner-2.mit.edu (Symantec Messaging Gateway) with SMTP id 3C.53.02355.7F6A8EF4; Mon, 25 Jun 2012 13:59:19 -0400 (EDT) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-1.mit.edu (8.13.8/8.9.2) with ESMTP id q5PHxIw9018999; Mon, 25 Jun 2012 13:59:18 -0400 Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91]) (authenticated bits=0) (User authenticated as amdragon@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id q5PHxG4x010040 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT); Mon, 25 Jun 2012 13:59:17 -0400 (EDT) Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.77) (envelope-from ) id 1SjDZE-0004TW-00; Mon, 25 Jun 2012 13:59:16 -0400 From: Austin Clements To: Sascha Silbe , notmuch Subject: Re: [PATCH 0/3] Speed up notmuch new for unchanged directories In-Reply-To: <1340555366-25891-1-git-send-email-sascha-pgp@silbe.org> References: <1340555366-25891-1-git-send-email-sascha-pgp@silbe.org> User-Agent: Notmuch/0.12+132~gf2f390b (http://notmuchmail.org) Emacs/23.3.1 (i486-pc-linux-gnu) Date: Mon, 25 Jun 2012 13:59:15 -0400 Message-ID: <87pq8n1de4.fsf@awakening.csail.mit.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrEIsWRmVeSWpSXmKPExsUixCmqrPt92Qt/g58HeSyu35zJbPH22Q1G ByaPZ6tuMXts/PuDJYApissmJTUnsyy1SN8ugStjVvNr9oKVPBWHz81gbWD8ytnFyMkhIWAi sX1TDzuELSZx4d56ti5GLg4hgX2MEpfunmCCcDYwSvTsfc0C4Zxkkljxci0zhLOEUaKpYxUj SD+bgIbEtv3LwWwRgSSJR0da2UBsYQF3iYVvLjCB2JwCrhIzHk4EiwsJuEjsWncGrF5UIF7i T+9msDiLgKrEg89XWEFsXqD7bl04zg5hC0qcnPmEBcRmFtCSuPHvJdMERoFZSFKzkKQWMDKt YpRNya3SzU3MzClOTdYtTk7My0st0jXSy80s0UtNKd3ECA5KSd4djO8OKh1iFOBgVOLh9ah/ 4S/EmlhWXJl7iFGSg0lJlHfREqAQX1J+SmVGYnFGfFFpTmrxIUYJDmYlEd4T84FyvCmJlVWp RfkwKWkOFiVx3ispN/2FBNITS1KzU1MLUotgsjIcHEoSvFzA6BMSLEpNT61Iy8wpQUgzcXCC DOcBGi67FGR4cUFibnFmOkT+FKOilDivNEizAEgiozQPrheWNF4xigO9IszLDVLFA0w4cN2v gAYzAQ1uPfAMZHBJIkJKqoGxaYGhq+uZF+uues8stvSe901md2n9dsZt+9yDGr9f+z4528DN 846zSltLesXZtywJn5rWzLPpTytlrXtUZDVNy+XhjpPWSWoHTXbOj57dGTBPOuvenE2inaJ5 6t2SO4TM1bfofzrhkWger2sTfvddQ2Tpc/Xg5y8aXY4HFxhK/8pve+5XNVGJpTgj0VCLuag4 EQA28Z169QIAAA== X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Jun 2012 17:59:22 -0000 On Sun, 24 Jun 2012, Sascha Silbe wrote: > All the time I thought what makes "notmuch new" so abysmally slow is the > stat() for each maildir. But as it continued to be slow even after I > moved most mails out of 'new' (into 'new-20120624'), I strace'd notmuch > and noticed it listed even unchanged directories, thereby listing and > iterating over each and every single of the 900k mails in my mail store. > > There's still quite some room for further improvements as it continues > to take several minutes to scan < 100 new mails in changed directories > containing < 1000 mails in total. Even the rsync run that fetches the > new mails is faster. I haven't looked over your patches yet, but this result surprises me. Could you explain your setup a little more? How much mail do you have and across how many directories? What file system are you using? I'm also surprised that your new approach helps. This directory listing has to be read off disk one way or the other, but listing directories is the bread-and-butter of file systems, whereas I would think that Xapian would require more IO to accomplish the same effect. Does your patch win because you can specifically list subdirectories out of Xapian, making the IO proportional to the number of subdirectories instead of the number of subdirectories and files (even though the constant factors probably favor reading from the file system)? I like the idea of these patches, I just want to make sure I have a firm grip on what's being optimized and why it wins.