From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id DDD26431FBC for ; Tue, 16 Oct 2012 07:55:12 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.799 X-Spam-Level: X-Spam-Status: No, score=-0.799 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 38H4adXAWhIE for ; Tue, 16 Oct 2012 07:55:10 -0700 (PDT) Received: from mail-wi0-f179.google.com (mail-wi0-f179.google.com [209.85.212.179]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by olra.theworths.org (Postfix) with ESMTPS id 7751B431FB6 for ; Tue, 16 Oct 2012 07:55:10 -0700 (PDT) Received: by mail-wi0-f179.google.com with SMTP id hq7so2923337wib.2 for ; Tue, 16 Oct 2012 07:55:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:subject:message-id:mail-followup-to:references :mime-version:content-type:content-disposition :content-transfer-encoding:in-reply-to:user-agent; bh=BJhNgeEvF4Du1+GeH7pci0NDb7w7CuRZCoGthXiZZ1M=; b=Q1kTKKVckQE47xaKElMw3AopEf6QMJ4hHlSgOKm9+iDxHPqP7U6D7bjfha3NjDVsyW XAlWLXFugnQDG6CPyxnNoOVA/T7SEiLjWE+myuH0T3D//YHDnET83KwcKlzzc1xihclM jcPYWxt6sPhZIdVB7Z/GhaE6bFym5giRe8TFoJlHEKCgFnUmia+1sTkza5X0EPE4aobc 3M0s9SAYzV6VOjxFQPZXnO0sIFhLom0AY006GfHuq1PipXa9GnJcg4NryoYS0Iv+eDf6 kg3JvpK+sZK7yT2QXiDtjG+yobHQgMK2q3oEabDIiInl6DayUY8p9cG9vWVpYTQqP7Vv vQEw== Received: by 10.180.80.104 with SMTP id q8mr32528965wix.6.1350399307879; Tue, 16 Oct 2012 07:55:07 -0700 (PDT) Received: from kuru.dyndns-at-home.com (pb-d-128-141-52-183.cern.ch. [128.141.52.183]) by mx.google.com with ESMTPS id f1sm19474364wiy.2.2012.10.16.07.55.05 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 16 Oct 2012 07:55:06 -0700 (PDT) Sender: suvayu ali Date: Tue, 16 Oct 2012 16:55:03 +0200 From: Suvayu Ali To: notmuch@notmuchmail.org Subject: Re: nbook: a notmuch based address book written in python Message-ID: <20121016145503.GC11488@kuru.dyndns-at-home.com> Mail-Followup-To: notmuch@notmuchmail.org References: <20120924082646.GA10577@kuru.dyndns-at-home.com> <20120925104457.12264.30350@megatron> <20121008093429.GC4534@kuru.dyndns-at-home.com> <20121013165851.29671.29869@brick.lan> <20121015105830.12412.43278@thinkbox.jade-hamburg.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20121015105830.12412.43278@thinkbox.jade-hamburg.de> User-Agent: Mutt/1.5.21 (2011-07-01) X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Oct 2012 14:55:13 -0000 Hi Justus, I finally had time to go through your response carefully. On Mon, Oct 15, 2012 at 12:58:30PM +0200, Justus Winter wrote: > > > > > ------------------------------- > > > > [~] time nbook Patrick > > > > > > > > Error opening /home/pazz/mail/gmail/[Google Mail].All Mail/cur/1330682270_0.12958.megatron,U=8766,FMD5=66ff6a8bc18a8a3ac4b311daa93d358a:2,S: Too many open files > > > > Traceback (most recent call last): > > > > File "/home/pazz/bin/nbook", line 167, in > > > > File "/home/pazz/bin/nbook", line 71, in __init__ > > > > File "/home/pazz/.local/lib/python2.7/site-packages/notmuch/message.py", line 233, in get_header > > > > notmuch.errors.NullPointerError [...] > > As mentioned before, I think you invalidate the Database object concurrently > > while your long-running algorithm goes through all messages. > > Xapian doesn't handle concurrent access to the index like a normalâ„¢ database would. > > This means you are notified by this error that some changes were detected. > > Maybe the error message should be more telling here though. Teythoon? > > The reason for this error is exactly what the error message says, you > are opening to many files. Check out this limit using ulimit -n: > > % ulimit -n > 4096 > > This problem is subtle. Here is a minimal test case: > > ~~~ snip ~~~ > import notmuch > > with notmuch.Database() as db: > query = notmuch.Query(db, 'a').search_messages() > for msg in query: > msg.get_header('from') > > with notmuch.Database() as db: > query = notmuch.Query(db, 'a').search_messages() > for msg in list(query): > msg.get_header('from') > ~~~ snap ~~~ > > % python test.py > Error opening /home/teythoon/Maildir/.lists.notmuch/cur/1323251462.M53044P18514.thinkbox,S=7306,W=7466:2,: Too many open files > Traceback (most recent call last): > File "test.py", line 11, in > msg.get_header('from') > File "/home/teythoon/.local/lib/python2.7/site-packages/notmuch/message.py", line 237, in get_header > raise NullPointerError() > notmuch.errors.NullPointerError > > Observe that it blows up in line 11, the first version works. The only > difference is that the second version creates a list from the notmuch > query. This prevents the garbage collector from collecting the message > objects and thus closing the file handles. So here's your fix: > > ~~~ snip ~~~ > diff --git a/nbook b/nbook > index 387c71d..b3d4fd6 100755 > --- a/nbook > +++ b/nbook > @@ -173,7 +173,7 @@ class AddressHeaders(object): > # Search > db = Database() > query = Query(db, 'from:"{0}" or to:"{0}"'.format(querystr)) > -msgs = list(query.search_messages()) > +msgs = query.search_messages() > > addresses = AddressHeaders(msgs, querystr) > print addresses > ~~~ snap ~~~ > This explanation helped me a lot, thanks! > A few more comments: > > > from notmuch import * > > Please avoid * imports, they prevent tools like pyflakes from checking > whether you accidentally misspelled any identifiers. > Point taken. I'll be more careful in the future. :) > > pyversion = float('%d.%d' % (sys.version_info.major, sys.version_info.minor)) > > if pyversion < 2.7: > > Converting this to float feels wrong. Consider doing sth like > > if sys.version_info.major > 2 or (sys.version_info.major == 2 and sys.version_info.minor >= 7): > I incorporated these suggestions too. > > print '`nbook\' needs Python 2.7 or higher for argparse' > > Note that in py3k print is a function and not a statement, so you need > to use braces. Consider dropping this at the beginning of all your > python files to make py2.7 use the new features: > > from __future__ import print_function, absolute_import, unicode_literals > > > exit(-1) > > exit is not a builtin function. You have to use sys.exit. Tools like > pyflakes can spot this kind of mistakes. Also, sys.exit also accepts a > string as argument which it prints to stderr before exiting with an > error code. > I will read-up some more about the above suggestions and update accordingly. > > self.__fromhdr__ += ',' + msg.get_header('from') > > Hm, this is somewhat unpythonic. It used to be the case that building > strings this way was a lot slower than building a list and then > joining it on a delimiter of your choice > (i.e. ','.join(from_headers)). This is (was?) because strings are > immutable in python and constantly creating strings just to throw them > away in the next iteration puts a lot of pressure on the memory > management system. Somewhat recent discussion here: > > http://stackoverflow.com/questions/1316887/what-is-the-most-efficient-string-concatenation-method-in-python > I had a commit with ','.join(..) in a private branch, but thanks for pointing out the reasons and the links to the discussion. This was very helpful. > > def print_addrs(self, fmtstr='', query=''): > > if '' == fmtstr: fmtstr = '%s %s\n' > > Ok, several things here: > > * The comparison looks weird, you are using the string constant as the > first operand. While this is technically not wrong, it is somewhat > unpythonic b/c if you read it out loud (''if the empty string is > equal to fmtstr'') it somewhat bends the 1:1 mapping of the semantic > of your program and the English sentence. It looks like this c hack > that is actually unnecessary in python b/c you cannot use the > assignment operator as a value (except for a=b=c=0 style > assignments). > Yes you are correct, I'm more used to C/C++ and the reason you mention is why I tend to write comparisons like that. I'll retrain my fingers for python from now on. > * Please don't put multiple statements in one line. > I will keep that in mind for the future. > * This can be written shorter and more idiomatic (yay keyword > arguments): > > def print_addrs(self, fmtstr='%s %s\n', query=''): > [...] > That was silly of me not to do that in the first place! :-p > Happy hacking :) > Justus Thank you soo much for this incredibly informative response. I learned a lot. Cheers, -- Suvayu Open source is the future. It sets us free.