From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 0625A429E2E for ; Tue, 22 Nov 2011 19:17:45 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ewbX6LcRWrfv for ; Tue, 22 Nov 2011 19:17:44 -0800 (PST) Received: from dmz-mailsec-scanner-4.mit.edu (DMZ-MAILSEC-SCANNER-4.MIT.EDU [18.9.25.15]) by olra.theworths.org (Postfix) with ESMTP id 88563431FB6 for ; Tue, 22 Nov 2011 19:17:44 -0800 (PST) X-AuditID: 1209190f-b7f6e6d0000008df-0c-4ecc65d81f11 Received: from mailhub-auth-3.mit.edu ( [18.9.21.43]) by dmz-mailsec-scanner-4.mit.edu (Symantec Messaging Gateway) with SMTP id E1.15.02271.8D56CCE4; Tue, 22 Nov 2011 22:17:44 -0500 (EST) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-3.mit.edu (8.13.8/8.9.2) with ESMTP id pAN3HhvQ002702; Tue, 22 Nov 2011 22:17:44 -0500 Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91]) (authenticated bits=0) (User authenticated as amdragon@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id pAN3HgZJ006884 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT); Tue, 22 Nov 2011 22:17:43 -0500 (EST) Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.77) (envelope-from ) id 1RT3NT-0006uq-44; Tue, 22 Nov 2011 22:20:03 -0500 Date: Tue, 22 Nov 2011 22:20:03 -0500 From: Austin Clements To: Tom Bulli Subject: Re: Notmuch indexing 21 million emails Message-ID: <20111123032002.GK9351@mit.edu> References: <1321930927.73603.YahooMailNeo@web36506.mail.mud.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1321930927.73603.YahooMailNeo@web36506.mail.mud.yahoo.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprFKsWRmVeSWpSXmKPExsUixCmqrXsj9YyfwYo73BZdd0+xW1y/OZPZ gcnj2apbzB6zZh1mCmCK4rJJSc3JLEst0rdL4Mp4uU6tYAZ3Revlf6wNjF85uhg5OSQETCT2 vm9jhrDFJC7cW8/WxcjFISSwj1Hi0qsWRghnA6PElcnLoJyTTBK39jZBlS1hlDh5qZkRpJ9F QFVi6ayDYLPYBDQktu1fDhYXEVCU+PFmPlicWcBcom1bE1Ccg0NYQF9i+kMNkDCvgLbEz4Nf WUBsIQF3iesHtzJCxAUlTs58wgLRqiOxc+sdNpBWZgFpieX/OCDC8hLNW2eDTecU8JCY2bWU CcQWFVCRmHJyG9sERuFZSCbNQjJpFsKkWUgmLWBkWcUom5JbpZubmJlTnJqsW5ycmJeXWqRr opebWaKXmlK6iREcBZL8Oxi/HVQ6xCjAwajEwxt18rSfEGtiWXFl7iFGSQ4mJVHerJQzfkJ8 SfkplRmJxRnxRaU5qcWHGCU4mJVEeK+5A+V4UxIrq1KL8mFS0hwsSuK8jTsc/IQE0hNLUrNT UwtSi2CyMhwcShK8aUlAjYJFqempFWmZOSUIaSYOTpDhPEDDn4Is5i0uSMwtzkyHyJ9iVJQS 540ASQiAJDJK8+B6YUnqFaM40CvCvM3xQFU8wAQH1/0KaDAT0OBpa0+ADC5JREhJNTB2CE+V aZixwermnKcs8Ztm8TB5hPl63Vt0Lat2UkONCdccpSS7A9ovSrM/xkWVl1o8VFEq2hlqVLZk SStDxk/l6UapVU5B38Nvnu295n5RrrEvKrs/PDLvatV8e+npeu5FLAqKgj9Lzj2Umd+5Xavk xCT396q56y6Jsfs7vl8cveW8s9ayL0osxRmJhlrMRcWJAGkHfagtAwAA Cc: "notmuch@notmuchmail.org" X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Nov 2011 03:17:45 -0000 Quoth Tom Bulli on Nov 21 at 7:02 pm: > I have a project where I need to search about 21 emails - and > decided to use "notmuch" for it.  The system is a Debian Squeeze, > the notmuch version is "0.8-1~bpo60+1" from "kyria's" private > repository. > > I am running the "notmuch new" for approx. 4 days now - and > according to "not,uch count" it has indexed about 4.5 million > emails. > > Is this expected performance?  Is there any way to speed that up? Currently, notmuch is much more optimized for search than it is for indexing. This is unfortunate for the initial indexing process and seems to be becoming increasingly unfortunate. There are some things you can try. One is to use an SSD if you aren't already, since constructing the index requires a lot of random IO. You can also try libeatmydata to disable fsync's, which may improve your IO performance, with the obvious crash-safety caveats. However, unless you have a lot of RAM, I suspect your index has long outgrown your buffer cache, so this may have limited impact. Since you're going to the trouble of indexing 21 million emails, you might want to try 0.10 (under freeze right now, to be released very, very soon). It won't improve your indexing time, but if you're doing searches with non-trivial numbers of results, emails indexed with 0.10 will search much faster. Sorry I don't have better news, but I hope this helps.