unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* nbook: a notmuch based address book written in python
@ 2012-09-24  8:26 Suvayu Ali
  2012-09-25 10:44 ` Patrick Totzke
  0 siblings, 1 reply; 7+ messages in thread
From: Suvayu Ali @ 2012-09-24  8:26 UTC (permalink / raw)
  To: notmuch

Hi,

(I'm not subscribed to the list, so please cc: me in your replies)

I am a new notmuch user.  I use it with mutt.  I wanted a Gmail like
address book, so I used the python bindings for notmuch to write a small
python program.  This is supposed to behave like abook.  Quality
standards permitting, I would like this to be included in the contrib
directory with notmuch.  I have tested it a little, but there could be
bugs lurking somewhere.  I have created a github repository[1], which
may be treated as upstream for this program.

This is my first serious python program, and the first time I'm
contributing something significant to an open source project.  So please
let me know if I have missed something.  Looking forward to your
feedback.

Cheers,

Footnotes:

[1] <https://github.com/suvayu/nbook>

-- 
Suvayu

Open source is the future. It sets us free.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nbook: a notmuch based address book written in python
  2012-09-24  8:26 nbook: a notmuch based address book written in python Suvayu Ali
@ 2012-09-25 10:44 ` Patrick Totzke
  2012-10-08  9:34   ` Suvayu Ali
  0 siblings, 1 reply; 7+ messages in thread
From: Patrick Totzke @ 2012-09-25 10:44 UTC (permalink / raw)
  To: Suvayu Ali, notmuch

Hey Suvayu, welcome to notmuch!

I hope you are aware that there are already a few search based abook tools
around for notmuch (listed in the wiki, albeit hidden in the emacs docs):
http://notmuchmail.org/emacstips/#index14h2
I personally use nottoomuch-addresses.sh, which apparently does some advanced
caching voodoo for speed.

But to your tool; practice test:
I wasn't able to use wildcards or simply prefixes of names. This is essential
if you want to use it for tabcompleting contacts in a MUA.
The time lookups take seems to depend on how many matches there are:

-------------------------------
time nbook Suvayu
1 unique email addresses found for `Suvayu'
fatkasuvayu+linux@gmail.com     Suvayu Ali

nbook Suvayu  0.04s user 0.01s system 95% cpu 0.050 total
-------------------------------
time nbook Justus
...

nbook Justus  0.21s user 0.07s system 11% cpu 2.484 total
-------------------------------
And If I look for my own name, this takes over a minute,
eventually dying. This could be an issue with libnotmuch though.
Possibly, your algorithm takes very long and then reads from an initially
opened Database object again, which was invalidated by concurrent writes of other processes..

-------------------------------
[~] time nbook Patrick                     

Error opening /home/pazz/mail/gmail/[Google Mail].All Mail/cur/1330682270_0.12958.megatron,U=8766,FMD5=66ff6a8bc18a8a3ac4b311daa93d358a:2,S: Too many open files
Traceback (most recent call last):
  File "/home/pazz/bin/nbook", line 167, in <module>
  File "/home/pazz/bin/nbook", line 71, in __init__
  File "/home/pazz/.local/lib/python2.7/site-packages/notmuch/message.py", line 233, in get_header
notmuch.errors.NullPointerError
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/apport_python_hook.py", line 66, in apport_excepthook
ImportError: No module named fileutils

Original exception was:
Traceback (most recent call last):
  File "/home/pazz/bin/nbook", line 167, in <module>
  File "/home/pazz/bin/nbook", line 71, in __init__
  File "/home/pazz/.local/lib/python2.7/site-packages/notmuch/message.py", line 233, in get_header
notmuch.errors.NullPointerError
nbook Patrick  3.20s user 5.47s system 12% cpu 1:11.65 total
------------------------------------

Anyway, have fun hacking notmuch! If you are looking for a related project to bring in your python skills
I could think of one or two :D
Best,
/p

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nbook: a notmuch based address book written in python
  2012-09-25 10:44 ` Patrick Totzke
@ 2012-10-08  9:34   ` Suvayu Ali
  2012-10-13 16:58     ` Patrick Totzke
  0 siblings, 1 reply; 7+ messages in thread
From: Suvayu Ali @ 2012-10-08  9:34 UTC (permalink / raw)
  To: notmuch

Hi Patrick,

Sorry for the very late reply; I got distracted with some personal
matters.

On Tue, Sep 25, 2012 at 11:44:57AM +0100, Patrick Totzke wrote:
> Hey Suvayu, welcome to notmuch!
> 
> I hope you are aware that there are already a few search based abook tools
> around for notmuch (listed in the wiki, albeit hidden in the emacs docs):
> http://notmuchmail.org/emacstips/#index14h2
> I personally use nottoomuch-addresses.sh, which apparently does some advanced
> caching voodoo for speed.
> 

I wasn't aware of either of them, thanks for pointing them out.  I'll
take a look for inspiration and ideas.

> But to your tool; practice test:
> I wasn't able to use wildcards or simply prefixes of names. This is essential
> if you want to use it for tabcompleting contacts in a MUA.

Since the idea was inspired by the completion on the Gmail web
interface, I already do a partial search so wildcards should not be
necessary.

> The time lookups take seems to depend on how many matches there are:
> 
> -------------------------------
> time nbook Suvayu
> 1 unique email addresses found for `Suvayu'
> fatkasuvayu+linux@gmail.com     Suvayu Ali
> 
> nbook Suvayu  0.04s user 0.01s system 95% cpu 0.050 total
> -------------------------------
> time nbook Justus
> ...
> 
> nbook Justus  0.21s user 0.07s system 11% cpu 2.484 total
> -------------------------------

Yes, I noticed this too when I searched for the more common names.  Not
sure how to get around this though.

> And If I look for my own name, this takes over a minute,
> eventually dying. This could be an issue with libnotmuch though.
> Possibly, your algorithm takes very long and then reads from an initially
> opened Database object again, which was invalidated by concurrent writes of other processes..
> 
> -------------------------------
> [~] time nbook Patrick                     
> 
> Error opening /home/pazz/mail/gmail/[Google Mail].All Mail/cur/1330682270_0.12958.megatron,U=8766,FMD5=66ff6a8bc18a8a3ac4b311daa93d358a:2,S: Too many open files
> Traceback (most recent call last):
>   File "/home/pazz/bin/nbook", line 167, in <module>
>   File "/home/pazz/bin/nbook", line 71, in __init__
>   File "/home/pazz/.local/lib/python2.7/site-packages/notmuch/message.py", line 233, in get_header
> notmuch.errors.NullPointerError
> Error in sys.excepthook:
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/dist-packages/apport_python_hook.py", line 66, in apport_excepthook
> ImportError: No module named fileutils
> 
> Original exception was:
> Traceback (most recent call last):
>   File "/home/pazz/bin/nbook", line 167, in <module>
>   File "/home/pazz/bin/nbook", line 71, in __init__
>   File "/home/pazz/.local/lib/python2.7/site-packages/notmuch/message.py", line 233, in get_header
> notmuch.errors.NullPointerError
> nbook Patrick  3.20s user 5.47s system 12% cpu 1:11.65 total
> ------------------------------------
> 

Yes someone else pointed this out too.  Again I'm not sure how to
proceed here.  I had a quick look at this last week and it seemed to me
the limitation comes from within the python bindings for notmuch.  Do
you have any ideas?

> Anyway, have fun hacking notmuch! If you are looking for a related project to bring in your python skills
> I could think of one or two :D

That would be wonderful.  To give you my background, I'm a graduate
student in physics and I have to do a lot of C/C++ and python
programming for my research.  Contributing to FOSS projects seems like a
wonderful way to learn to collaborate and clean programming (we
physicists tend to be sloppy programmers :-p).

> Best,
> /p

Cheers,

-- 
Suvayu

Open source is the future. It sets us free.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nbook: a notmuch based address book written in python
  2012-10-08  9:34   ` Suvayu Ali
@ 2012-10-13 16:58     ` Patrick Totzke
  2012-10-15 10:58       ` Justus Winter
  2012-10-15 11:52       ` Suvayu Ali
  0 siblings, 2 replies; 7+ messages in thread
From: Patrick Totzke @ 2012-10-13 16:58 UTC (permalink / raw)
  To: Suvayu Ali, notmuch

[-- Attachment #1: Type: text/plain, Size: 5337 bytes --]

Quoting Suvayu Ali (2012-10-08 10:34:29)
> Hi Patrick,
> 
> Sorry for the very late reply; I got distracted with some personal
> matters.
> 
> On Tue, Sep 25, 2012 at 11:44:57AM +0100, Patrick Totzke wrote:
> > Hey Suvayu, welcome to notmuch!
> > 
> > I hope you are aware that there are already a few search based abook tools
> > around for notmuch (listed in the wiki, albeit hidden in the emacs docs):
> > http://notmuchmail.org/emacstips/#index14h2
> > I personally use nottoomuch-addresses.sh, which apparently does some advanced
> > caching voodoo for speed.
> > 
> 
> I wasn't aware of either of them, thanks for pointing them out.  I'll
> take a look for inspiration and ideas.
> 
> > But to your tool; practice test:
> > I wasn't able to use wildcards or simply prefixes of names. This is essential
> > if you want to use it for tabcompleting contacts in a MUA.
> 
> Since the idea was inspired by the completion on the Gmail web
> interface, I already do a partial search so wildcards should not be
> necessary.

Not sure what you mean here: If I compose a mail using gmails web interface
and type a prefix of someone's name I will get this contect as a suggestion.
My point was that using your tool, I did not get a contact suggested
for all prefixes.

> > The time lookups take seems to depend on how many matches there are:
> > 
> > -------------------------------
> > time nbook Suvayu
> > 1 unique email addresses found for `Suvayu'
> > fatkasuvayu+linux@gmail.com     Suvayu Ali
> > 
> > nbook Suvayu  0.04s user 0.01s system 95% cpu 0.050 total
> > -------------------------------
> > time nbook Justus
> > ...
> > 
> > nbook Justus  0.21s user 0.07s system 11% cpu 2.484 total
> > -------------------------------
> 
> Yes, I noticed this too when I searched for the more common names.  Not
> sure how to get around this though.

I think this is a conceptual problem with your algorithm:
You look up *all* messages and add a name to your result-list
if it matches. This means you go through some condidate
as often as you index contains mails from/to him.
What one really wants is to ask the database to do something like
  "SELECT name,email from RECIPIENTS_OR_SENDER"
where RECIPIENTS_OR_SENDER is some imaginary list that stores
a set of contacts.

Bottom line: One would have to change the layout of the underlying
database (not likely) or do regularly update some cache
and only work on that. This is what some of the mentioned tools do if i'm not mistaken.

> > And If I look for my own name, this takes over a minute,
> > eventually dying. This could be an issue with libnotmuch though.
> > Possibly, your algorithm takes very long and then reads from an initially
> > opened Database object again, which was invalidated by concurrent writes of other processes..
> > 
> > -------------------------------
> > [~] time nbook Patrick                     
> > 
> > Error opening /home/pazz/mail/gmail/[Google Mail].All Mail/cur/1330682270_0.12958.megatron,U=8766,FMD5=66ff6a8bc18a8a3ac4b311daa93d358a:2,S: Too many open files
> > Traceback (most recent call last):
> >   File "/home/pazz/bin/nbook", line 167, in <module>
> >   File "/home/pazz/bin/nbook", line 71, in __init__
> >   File "/home/pazz/.local/lib/python2.7/site-packages/notmuch/message.py", line 233, in get_header
> > notmuch.errors.NullPointerError
> > Error in sys.excepthook:
> > Traceback (most recent call last):
> >   File "/usr/lib/python2.7/dist-packages/apport_python_hook.py", line 66, in apport_excepthook
> > ImportError: No module named fileutils
> > 
> > Original exception was:
> > Traceback (most recent call last):
> >   File "/home/pazz/bin/nbook", line 167, in <module>
> >   File "/home/pazz/bin/nbook", line 71, in __init__
> >   File "/home/pazz/.local/lib/python2.7/site-packages/notmuch/message.py", line 233, in get_header
> > notmuch.errors.NullPointerError
> > nbook Patrick  3.20s user 5.47s system 12% cpu 1:11.65 total
> > ------------------------------------
> > 
> 
> Yes someone else pointed this out too.  Again I'm not sure how to
> proceed here.  I had a quick look at this last week and it seemed to me
> the limitation comes from within the python bindings for notmuch.  Do
> you have any ideas?

As mentioned before, I think you invalidate the Database object concurrently
while your long-running algorithm goes through all messages.
Xapian doesn't handle concurrent access to the index like a normal™ database would.
This means you are notified by this error that some changes were detected.
Maybe the error message should be more telling here though. Teythoon?

> > Anyway, have fun hacking notmuch! If you are looking for a related project to bring in your python skills
> > I could think of one or two :D
> 
> That would be wonderful.  To give you my background, I'm a graduate
> student in physics and I have to do a lot of C/C++ and python
> programming for my research.  Contributing to FOSS projects seems like a
> wonderful way to learn to collaborate and clean programming (we
> physicists tend to be sloppy programmers :-p).

https://github.com/teythoon/afew
https://github.com/pazz/alot
http://excess.org/urwid/

I'm sure patches will be welcome to any of the above :)
Best,
/p

[-- Attachment #2: signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEABECAAYFAlB5ncsACgkQlDQDZ9fWxaofuwCbBIrFTCAEoimDW+oZLkLIOp5+
hFsAnjPfXjLw2idZX33ykZMrhQ5KXSp/
=1H/w
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nbook: a notmuch based address book written in python
  2012-10-13 16:58     ` Patrick Totzke
@ 2012-10-15 10:58       ` Justus Winter
  2012-10-16 14:55         ` Suvayu Ali
  2012-10-15 11:52       ` Suvayu Ali
  1 sibling, 1 reply; 7+ messages in thread
From: Justus Winter @ 2012-10-15 10:58 UTC (permalink / raw)
  To: Patrick Totzke, Suvayu Ali, notmuch

Hi Suvayu :)

welcome to notmuch and python.

Quoting Patrick Totzke (2012-10-13 18:58:51)
> > > And If I look for my own name, this takes over a minute,
> > > eventually dying. This could be an issue with libnotmuch though.
> > > Possibly, your algorithm takes very long and then reads from an initially
> > > opened Database object again, which was invalidated by concurrent writes of other processes..

Hm no, see below.

> > > -------------------------------
> > > [~] time nbook Patrick                     
> > > 
> > > Error opening /home/pazz/mail/gmail/[Google Mail].All Mail/cur/1330682270_0.12958.megatron,U=8766,FMD5=66ff6a8bc18a8a3ac4b311daa93d358a:2,S: Too many open files
> > > Traceback (most recent call last):
> > >   File "/home/pazz/bin/nbook", line 167, in <module>
> > >   File "/home/pazz/bin/nbook", line 71, in __init__
> > >   File "/home/pazz/.local/lib/python2.7/site-packages/notmuch/message.py", line 233, in get_header
> > > notmuch.errors.NullPointerError
> > > Error in sys.excepthook:
> > > Traceback (most recent call last):
> > >   File "/usr/lib/python2.7/dist-packages/apport_python_hook.py", line 66, in apport_excepthook
> > > ImportError: No module named fileutils
> > > 
> > > Original exception was:
> > > Traceback (most recent call last):
> > >   File "/home/pazz/bin/nbook", line 167, in <module>
> > >   File "/home/pazz/bin/nbook", line 71, in __init__
> > >   File "/home/pazz/.local/lib/python2.7/site-packages/notmuch/message.py", line 233, in get_header
> > > notmuch.errors.NullPointerError
> > > nbook Patrick  3.20s user 5.47s system 12% cpu 1:11.65 total
> > > ------------------------------------
> > > 
> > 
> > Yes someone else pointed this out too.  Again I'm not sure how to
> > proceed here.  I had a quick look at this last week and it seemed to me
> > the limitation comes from within the python bindings for notmuch.  Do
> > you have any ideas?
> 
> As mentioned before, I think you invalidate the Database object concurrently
> while your long-running algorithm goes through all messages.
> Xapian doesn't handle concurrent access to the index like a normal™ database would.
> This means you are notified by this error that some changes were detected.
> Maybe the error message should be more telling here though. Teythoon?

The reason for this error is exactly what the error message says, you
are opening to many files. Check out this limit using ulimit -n:

% ulimit -n
4096

This problem is subtle. Here is a minimal test case:

~~~ snip ~~~
import notmuch

with notmuch.Database() as db:
    query = notmuch.Query(db, 'a').search_messages()
    for msg in query:
        msg.get_header('from')

with notmuch.Database() as db:
    query = notmuch.Query(db, 'a').search_messages()
    for msg in list(query):
        msg.get_header('from')
~~~ snap ~~~

% python test.py
Error opening /home/teythoon/Maildir/.lists.notmuch/cur/1323251462.M53044P18514.thinkbox,S=7306,W=7466:2,: Too many open files
Traceback (most recent call last):
  File "test.py", line 11, in <module>
    msg.get_header('from')
  File "/home/teythoon/.local/lib/python2.7/site-packages/notmuch/message.py", line 237, in get_header
    raise NullPointerError()
notmuch.errors.NullPointerError

Observe that it blows up in line 11, the first version works. The only
difference is that the second version creates a list from the notmuch
query. This prevents the garbage collector from collecting the message
objects and thus closing the file handles. So here's your fix:

~~~ snip ~~~
diff --git a/nbook b/nbook
index 387c71d..b3d4fd6 100755
--- a/nbook
+++ b/nbook
@@ -173,7 +173,7 @@ class AddressHeaders(object):
 # Search
 db = Database()
 query = Query(db, 'from:"{0}" or to:"{0}"'.format(querystr))
-msgs = list(query.search_messages())
+msgs = query.search_messages()
 
 addresses = AddressHeaders(msgs, querystr)
 print addresses
~~~ snap ~~~

A few more comments:

> from notmuch import *

Please avoid * imports, they prevent tools like pyflakes from checking
whether you accidentally misspelled any identifiers.

> pyversion = float('%d.%d' % (sys.version_info.major, sys.version_info.minor))
> if pyversion < 2.7:

Converting this to float feels wrong. Consider doing sth like

if sys.version_info.major > 2 or (sys.version_info.major == 2 and sys.version_info.minor >= 7):

>     print '`nbook\' needs Python 2.7 or higher for argparse'

Note that in py3k print is a function and not a statement, so you need
to use braces. Consider dropping this at the beginning of all your
python files to make py2.7 use the new features:

from __future__ import print_function, absolute_import, unicode_literals

>     exit(-1)

exit is not a builtin function. You have to use sys.exit. Tools like
pyflakes can spot this kind of mistakes. Also, sys.exit also accepts a
string as argument which it prints to stderr before exiting with an
error code.

>         self.__fromhdr__ += ',' + msg.get_header('from')

Hm, this is somewhat unpythonic. It used to be the case that building
strings this way was a lot slower than building a list and then
joining it on a delimiter of your choice
(i.e. ','.join(from_headers)). This is (was?) because strings are
immutable in python and constantly creating strings just to throw them
away in the next iteration puts a lot of pressure on the memory
management system. Somewhat recent discussion here:

http://stackoverflow.com/questions/1316887/what-is-the-most-efficient-string-concatenation-method-in-python

>     def print_addrs(self, fmtstr='', query=''):
>         if '' == fmtstr: fmtstr = '%s    %s\n'

Ok, several things here:

* The comparison looks weird, you are using the string constant as the
  first operand. While this is technically not wrong, it is somewhat
  unpythonic b/c if you read it out loud (''if the empty string is
  equal to fmtstr'') it somewhat bends the 1:1 mapping of the semantic
  of your program and the English sentence. It looks like this c hack
  that is actually unnecessary in python b/c you cannot use the
  assignment operator as a value (except for a=b=c=0 style
  assignments).

* Please don't put multiple statements in one line.

* This can be written shorter and more idiomatic (yay keyword
  arguments):

    def print_addrs(self, fmtstr='%s    %s\n', query=''):
        [...]

Happy hacking :)
Justus

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: nbook: a notmuch based address book written in python
  2012-10-13 16:58     ` Patrick Totzke
  2012-10-15 10:58       ` Justus Winter
@ 2012-10-15 11:52       ` Suvayu Ali
  1 sibling, 0 replies; 7+ messages in thread
From: Suvayu Ali @ 2012-10-15 11:52 UTC (permalink / raw)
  To: notmuch

Hello Patrick,

On Sat, Oct 13, 2012 at 05:58:51PM +0100, Patrick Totzke wrote:
> Quoting Suvayu Ali (2012-10-08 10:34:29)
> > 
> > > But to your tool; practice test:
> > > I wasn't able to use wildcards or simply prefixes of names. This is essential
> > > if you want to use it for tabcompleting contacts in a MUA.
> > 
> > Since the idea was inspired by the completion on the Gmail web
> > interface, I already do a partial search so wildcards should not be
> > necessary.
> 
> Not sure what you mean here: If I compose a mail using gmails web interface
> and type a prefix of someone's name I will get this contect as a suggestion.
> My point was that using your tool, I did not get a contact suggested
> for all prefixes.
> 

What I meant was, I search for *<query>* in the name or email address
strings.  So adding a glob character is not needed; in fact adding it
would mean my algorithm would search for a literal "*" and fail.

[...]

> 
> I think this is a conceptual problem with your algorithm:
> You look up *all* messages and add a name to your result-list
> if it matches. This means you go through some condidate
> as often as you index contains mails from/to him.
> What one really wants is to ask the database to do something like
>   "SELECT name,email from RECIPIENTS_OR_SENDER"
> where RECIPIENTS_OR_SENDER is some imaginary list that stores
> a set of contacts.
> 
> Bottom line: One would have to change the layout of the underlying
> database (not likely) or do regularly update some cache
> and only work on that. This is what some of the mentioned tools do if i'm not mistaken.
> 

Yes, you are right.  I realised this too when I tried out
nottoomuch-address a few days back.  Caching seems to be the solution
for performance issues.

[...]

> > > -------------------------------
> > > [~] time nbook Patrick                     
> > > 
> > > Error opening /home/pazz/mail/gmail/[Google Mail].All Mail/cur/1330682270_0.12958.megatron,U=8766,FMD5=66ff6a8bc18a8a3ac4b311daa93d358a:2,S: Too many open files
> > > Traceback (most recent call last):
> > >   File "/home/pazz/bin/nbook", line 167, in <module>
> > >   File "/home/pazz/bin/nbook", line 71, in __init__
> > >   File "/home/pazz/.local/lib/python2.7/site-packages/notmuch/message.py", line 233, in get_header
> > > notmuch.errors.NullPointerError

[...]

I see in the meantime I was writing this email, Justus gave an
explanation for the issue; I'll go through the response carefully.

> 
> https://github.com/teythoon/afew
> https://github.com/pazz/alot
> http://excess.org/urwid/
> 
> I'm sure patches will be welcome to any of the above :)

Al 3 seem very interesting, but I think I will take a closer look at
afew and urwid.

Thanks for the pointers,

:)

-- 
Suvayu

Open source is the future. It sets us free.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: nbook: a notmuch based address book written in python
  2012-10-15 10:58       ` Justus Winter
@ 2012-10-16 14:55         ` Suvayu Ali
  0 siblings, 0 replies; 7+ messages in thread
From: Suvayu Ali @ 2012-10-16 14:55 UTC (permalink / raw)
  To: notmuch

Hi Justus,

I finally had time to go through your response carefully.

On Mon, Oct 15, 2012 at 12:58:30PM +0200, Justus Winter wrote:
> 
> > > > -------------------------------
> > > > [~] time nbook Patrick                     
> > > > 
> > > > Error opening /home/pazz/mail/gmail/[Google Mail].All Mail/cur/1330682270_0.12958.megatron,U=8766,FMD5=66ff6a8bc18a8a3ac4b311daa93d358a:2,S: Too many open files
> > > > Traceback (most recent call last):
> > > >   File "/home/pazz/bin/nbook", line 167, in <module>
> > > >   File "/home/pazz/bin/nbook", line 71, in __init__
> > > >   File "/home/pazz/.local/lib/python2.7/site-packages/notmuch/message.py", line 233, in get_header
> > > > notmuch.errors.NullPointerError

[...]

> > As mentioned before, I think you invalidate the Database object concurrently
> > while your long-running algorithm goes through all messages.
> > Xapian doesn't handle concurrent access to the index like a normal™ database would.
> > This means you are notified by this error that some changes were detected.
> > Maybe the error message should be more telling here though. Teythoon?
> 
> The reason for this error is exactly what the error message says, you
> are opening to many files. Check out this limit using ulimit -n:
> 
> % ulimit -n
> 4096
> 
> This problem is subtle. Here is a minimal test case:
> 
> ~~~ snip ~~~
> import notmuch
> 
> with notmuch.Database() as db:
>     query = notmuch.Query(db, 'a').search_messages()
>     for msg in query:
>         msg.get_header('from')
> 
> with notmuch.Database() as db:
>     query = notmuch.Query(db, 'a').search_messages()
>     for msg in list(query):
>         msg.get_header('from')
> ~~~ snap ~~~
> 
> % python test.py
> Error opening /home/teythoon/Maildir/.lists.notmuch/cur/1323251462.M53044P18514.thinkbox,S=7306,W=7466:2,: Too many open files
> Traceback (most recent call last):
>   File "test.py", line 11, in <module>
>     msg.get_header('from')
>   File "/home/teythoon/.local/lib/python2.7/site-packages/notmuch/message.py", line 237, in get_header
>     raise NullPointerError()
> notmuch.errors.NullPointerError
> 
> Observe that it blows up in line 11, the first version works. The only
> difference is that the second version creates a list from the notmuch
> query. This prevents the garbage collector from collecting the message
> objects and thus closing the file handles. So here's your fix:
> 
> ~~~ snip ~~~
> diff --git a/nbook b/nbook
> index 387c71d..b3d4fd6 100755
> --- a/nbook
> +++ b/nbook
> @@ -173,7 +173,7 @@ class AddressHeaders(object):
>  # Search
>  db = Database()
>  query = Query(db, 'from:"{0}" or to:"{0}"'.format(querystr))
> -msgs = list(query.search_messages())
> +msgs = query.search_messages()
>  
>  addresses = AddressHeaders(msgs, querystr)
>  print addresses
> ~~~ snap ~~~
> 

This explanation helped me a lot, thanks!

> A few more comments:
> 
> > from notmuch import *
> 
> Please avoid * imports, they prevent tools like pyflakes from checking
> whether you accidentally misspelled any identifiers.
> 

Point taken.  I'll be more careful in the future.  :)

> > pyversion = float('%d.%d' % (sys.version_info.major, sys.version_info.minor))
> > if pyversion < 2.7:
> 
> Converting this to float feels wrong. Consider doing sth like
> 
> if sys.version_info.major > 2 or (sys.version_info.major == 2 and sys.version_info.minor >= 7):
> 

I incorporated these suggestions too.

> >     print '`nbook\' needs Python 2.7 or higher for argparse'
> 
> Note that in py3k print is a function and not a statement, so you need
> to use braces. Consider dropping this at the beginning of all your
> python files to make py2.7 use the new features:
> 
> from __future__ import print_function, absolute_import, unicode_literals
> 
> >     exit(-1)
> 
> exit is not a builtin function. You have to use sys.exit. Tools like
> pyflakes can spot this kind of mistakes. Also, sys.exit also accepts a
> string as argument which it prints to stderr before exiting with an
> error code.
> 

I will read-up some more about the above suggestions and update
accordingly.

> >         self.__fromhdr__ += ',' + msg.get_header('from')
> 
> Hm, this is somewhat unpythonic. It used to be the case that building
> strings this way was a lot slower than building a list and then
> joining it on a delimiter of your choice
> (i.e. ','.join(from_headers)). This is (was?) because strings are
> immutable in python and constantly creating strings just to throw them
> away in the next iteration puts a lot of pressure on the memory
> management system. Somewhat recent discussion here:
> 
> http://stackoverflow.com/questions/1316887/what-is-the-most-efficient-string-concatenation-method-in-python
> 

I had a commit with ','.join(..) in a private branch, but thanks for
pointing out the reasons and the links to the discussion.  This was very
helpful.

> >     def print_addrs(self, fmtstr='', query=''):
> >         if '' == fmtstr: fmtstr = '%s    %s\n'
> 
> Ok, several things here:
> 
> * The comparison looks weird, you are using the string constant as the
>   first operand. While this is technically not wrong, it is somewhat
>   unpythonic b/c if you read it out loud (''if the empty string is
>   equal to fmtstr'') it somewhat bends the 1:1 mapping of the semantic
>   of your program and the English sentence. It looks like this c hack
>   that is actually unnecessary in python b/c you cannot use the
>   assignment operator as a value (except for a=b=c=0 style
>   assignments).
> 

Yes you are correct, I'm more used to C/C++ and the reason you mention
is why I tend to write comparisons like that.  I'll retrain my fingers
for python from now on.

> * Please don't put multiple statements in one line.
> 

I will keep that in mind for the future.

> * This can be written shorter and more idiomatic (yay keyword
>   arguments):
> 
>     def print_addrs(self, fmtstr='%s    %s\n', query=''):
>         [...]
> 

That was silly of me not to do that in the first place! :-p

> Happy hacking :)
> Justus

Thank you soo much for this incredibly informative response.  I learned
a lot.

Cheers,

-- 
Suvayu

Open source is the future. It sets us free.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-10-16 14:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-24  8:26 nbook: a notmuch based address book written in python Suvayu Ali
2012-09-25 10:44 ` Patrick Totzke
2012-10-08  9:34   ` Suvayu Ali
2012-10-13 16:58     ` Patrick Totzke
2012-10-15 10:58       ` Justus Winter
2012-10-16 14:55         ` Suvayu Ali
2012-10-15 11:52       ` Suvayu Ali

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).