unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: Suvayu Ali <fatkasuvayu+linux@gmail.com>
To: notmuch@notmuchmail.org
Subject: Re: nbook: a notmuch based address book written in python
Date: Tue, 16 Oct 2012 16:55:03 +0200	[thread overview]
Message-ID: <20121016145503.GC11488@kuru.dyndns-at-home.com> (raw)
In-Reply-To: <20121015105830.12412.43278@thinkbox.jade-hamburg.de>

Hi Justus,

I finally had time to go through your response carefully.

On Mon, Oct 15, 2012 at 12:58:30PM +0200, Justus Winter wrote:
> 
> > > > -------------------------------
> > > > [~] time nbook Patrick                     
> > > > 
> > > > Error opening /home/pazz/mail/gmail/[Google Mail].All Mail/cur/1330682270_0.12958.megatron,U=8766,FMD5=66ff6a8bc18a8a3ac4b311daa93d358a:2,S: Too many open files
> > > > Traceback (most recent call last):
> > > >   File "/home/pazz/bin/nbook", line 167, in <module>
> > > >   File "/home/pazz/bin/nbook", line 71, in __init__
> > > >   File "/home/pazz/.local/lib/python2.7/site-packages/notmuch/message.py", line 233, in get_header
> > > > notmuch.errors.NullPointerError

[...]

> > As mentioned before, I think you invalidate the Database object concurrently
> > while your long-running algorithm goes through all messages.
> > Xapian doesn't handle concurrent access to the index like a normal™ database would.
> > This means you are notified by this error that some changes were detected.
> > Maybe the error message should be more telling here though. Teythoon?
> 
> The reason for this error is exactly what the error message says, you
> are opening to many files. Check out this limit using ulimit -n:
> 
> % ulimit -n
> 4096
> 
> This problem is subtle. Here is a minimal test case:
> 
> ~~~ snip ~~~
> import notmuch
> 
> with notmuch.Database() as db:
>     query = notmuch.Query(db, 'a').search_messages()
>     for msg in query:
>         msg.get_header('from')
> 
> with notmuch.Database() as db:
>     query = notmuch.Query(db, 'a').search_messages()
>     for msg in list(query):
>         msg.get_header('from')
> ~~~ snap ~~~
> 
> % python test.py
> Error opening /home/teythoon/Maildir/.lists.notmuch/cur/1323251462.M53044P18514.thinkbox,S=7306,W=7466:2,: Too many open files
> Traceback (most recent call last):
>   File "test.py", line 11, in <module>
>     msg.get_header('from')
>   File "/home/teythoon/.local/lib/python2.7/site-packages/notmuch/message.py", line 237, in get_header
>     raise NullPointerError()
> notmuch.errors.NullPointerError
> 
> Observe that it blows up in line 11, the first version works. The only
> difference is that the second version creates a list from the notmuch
> query. This prevents the garbage collector from collecting the message
> objects and thus closing the file handles. So here's your fix:
> 
> ~~~ snip ~~~
> diff --git a/nbook b/nbook
> index 387c71d..b3d4fd6 100755
> --- a/nbook
> +++ b/nbook
> @@ -173,7 +173,7 @@ class AddressHeaders(object):
>  # Search
>  db = Database()
>  query = Query(db, 'from:"{0}" or to:"{0}"'.format(querystr))
> -msgs = list(query.search_messages())
> +msgs = query.search_messages()
>  
>  addresses = AddressHeaders(msgs, querystr)
>  print addresses
> ~~~ snap ~~~
> 

This explanation helped me a lot, thanks!

> A few more comments:
> 
> > from notmuch import *
> 
> Please avoid * imports, they prevent tools like pyflakes from checking
> whether you accidentally misspelled any identifiers.
> 

Point taken.  I'll be more careful in the future.  :)

> > pyversion = float('%d.%d' % (sys.version_info.major, sys.version_info.minor))
> > if pyversion < 2.7:
> 
> Converting this to float feels wrong. Consider doing sth like
> 
> if sys.version_info.major > 2 or (sys.version_info.major == 2 and sys.version_info.minor >= 7):
> 

I incorporated these suggestions too.

> >     print '`nbook\' needs Python 2.7 or higher for argparse'
> 
> Note that in py3k print is a function and not a statement, so you need
> to use braces. Consider dropping this at the beginning of all your
> python files to make py2.7 use the new features:
> 
> from __future__ import print_function, absolute_import, unicode_literals
> 
> >     exit(-1)
> 
> exit is not a builtin function. You have to use sys.exit. Tools like
> pyflakes can spot this kind of mistakes. Also, sys.exit also accepts a
> string as argument which it prints to stderr before exiting with an
> error code.
> 

I will read-up some more about the above suggestions and update
accordingly.

> >         self.__fromhdr__ += ',' + msg.get_header('from')
> 
> Hm, this is somewhat unpythonic. It used to be the case that building
> strings this way was a lot slower than building a list and then
> joining it on a delimiter of your choice
> (i.e. ','.join(from_headers)). This is (was?) because strings are
> immutable in python and constantly creating strings just to throw them
> away in the next iteration puts a lot of pressure on the memory
> management system. Somewhat recent discussion here:
> 
> http://stackoverflow.com/questions/1316887/what-is-the-most-efficient-string-concatenation-method-in-python
> 

I had a commit with ','.join(..) in a private branch, but thanks for
pointing out the reasons and the links to the discussion.  This was very
helpful.

> >     def print_addrs(self, fmtstr='', query=''):
> >         if '' == fmtstr: fmtstr = '%s    %s\n'
> 
> Ok, several things here:
> 
> * The comparison looks weird, you are using the string constant as the
>   first operand. While this is technically not wrong, it is somewhat
>   unpythonic b/c if you read it out loud (''if the empty string is
>   equal to fmtstr'') it somewhat bends the 1:1 mapping of the semantic
>   of your program and the English sentence. It looks like this c hack
>   that is actually unnecessary in python b/c you cannot use the
>   assignment operator as a value (except for a=b=c=0 style
>   assignments).
> 

Yes you are correct, I'm more used to C/C++ and the reason you mention
is why I tend to write comparisons like that.  I'll retrain my fingers
for python from now on.

> * Please don't put multiple statements in one line.
> 

I will keep that in mind for the future.

> * This can be written shorter and more idiomatic (yay keyword
>   arguments):
> 
>     def print_addrs(self, fmtstr='%s    %s\n', query=''):
>         [...]
> 

That was silly of me not to do that in the first place! :-p

> Happy hacking :)
> Justus

Thank you soo much for this incredibly informative response.  I learned
a lot.

Cheers,

-- 
Suvayu

Open source is the future. It sets us free.

  reply	other threads:[~2012-10-16 14:55 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-24  8:26 nbook: a notmuch based address book written in python Suvayu Ali
2012-09-25 10:44 ` Patrick Totzke
2012-10-08  9:34   ` Suvayu Ali
2012-10-13 16:58     ` Patrick Totzke
2012-10-15 10:58       ` Justus Winter
2012-10-16 14:55         ` Suvayu Ali [this message]
2012-10-15 11:52       ` Suvayu Ali

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121016145503.GC11488@kuru.dyndns-at-home.com \
    --to=fatkasuvayu+linux@gmail.com \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).