unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* [python] get all messages of a thread
@ 2011-05-28 13:18 Patrick Totzke
  2011-05-31 23:48 ` Carl Worth
  2011-06-01  5:35 ` Brian May
  0 siblings, 2 replies; 8+ messages in thread
From: Patrick Totzke @ 2011-05-28 13:18 UTC (permalink / raw)
  To: notmuch

[-- Attachment #1: Type: text/plain, Size: 1562 bytes --]

Hi!
I wonder how I would get all messages of a thread with the python
bindings. The doc says one can only use Thread.get_toplevel_messages()
and then must recursively call Message.get_replies().
But look:

---------------- snip -------------------------
#/usr/bin/python
from notmuch import Database

def get_thread():
    query = Database().create_query('thread:0000000000000c48')
    return query.search_threads().next()
    
def replies(msg):
    acc = []
    r = msg.get_replies()
    if r: #because we cant iterate on NoneType
        for m in r:
            acc.append(m)
            acc += replies(m)
    return acc
    
t=get_thread()

msgs = []
for m in t.get_toplevel_messages():
    msgs.append(m)
    msgs += replies(m)

print msgs
------------------- snap --------------------
gives:

Traceback (most recent call last):
  File "replies.py", line 22, in <module>
    msgs +=replies(m)
  File "replies.py", line 12, in replies
    for m in r:
  File "/usr/local/lib/python2.7/dist-packages/notmuch/message.py", line 140, in next
    raise NotmuchError(STATUS.NOT_INITIALIZED)
notmuch.globals.NotmuchError: Operation on uninitialized object impossible.
---------------------------------------------
It seems that nobody needed this before. Even in bindings/python/notmuch.py
only Threads.get_toplevel_messages() gets called, and then a (undocumented)
Messages.print_messages is used (cf line 639, in show)

any suggestions?
I would rather not call the notmuch binary and parse its output..
Thanks!
/p

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [python] get all messages of a thread
  2011-05-28 13:18 [python] get all messages of a thread Patrick Totzke
@ 2011-05-31 23:48 ` Carl Worth
  2011-06-01  5:35 ` Brian May
  1 sibling, 0 replies; 8+ messages in thread
From: Carl Worth @ 2011-05-31 23:48 UTC (permalink / raw)
  To: Patrick Totzke, notmuch

[-- Attachment #1: Type: text/plain, Size: 555 bytes --]

On Sat, 28 May 2011 14:18:05 +0100, Patrick Totzke <patricktotzke@googlemail.com> wrote:
> It seems that nobody needed this before. Even in bindings/python/notmuch.py
> only Threads.get_toplevel_messages() gets called, and then a (undocumented)
> Messages.print_messages is used (cf line 639, in show)
> 
> any suggestions?

Looks like a bug in the bindings to me.

> I would rather not call the notmuch binary and parse its output..

Of course not!

Python folks, does someone have a quick fix here?

-Carl

-- 
carl.d.worth@intel.com

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [python] get all messages of a thread
  2011-05-28 13:18 [python] get all messages of a thread Patrick Totzke
  2011-05-31 23:48 ` Carl Worth
@ 2011-06-01  5:35 ` Brian May
  2011-06-02  7:05   ` Sebastian Spaeth
  1 sibling, 1 reply; 8+ messages in thread
From: Brian May @ 2011-06-01  5:35 UTC (permalink / raw)
  To: Patrick Totzke; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 1254 bytes --]

On 28 May 2011 23:18, Patrick Totzke <patricktotzke@googlemail.com> wrote:

>    if r: #because we cant iterate on NoneType
>

I don't understand why, but this line sets r._msgs to None. So it crashes,
because it has no message ids to look for.

If you change it to

if r is not None:

... then it works for me.

Oh, I see, for your code, there is a implied call to __len__, and the
__len__ function is completely broken for the reasons described in the
documentation:

      .. note:: As this iterates over the messages, we will not be able to=
               iterate over them again! So this will fail::

                 #THIS FAILS
                 msgs = Database().create_query('').search_message()
                 if len(msgs) > 0:              #this 'exhausts' msgs
                     # next line raises
NotmuchError(STATUS.NOT_INITIALIZED)!!!
                     for msg in msgs: print msg

               Most of the time, using the
               :meth:`Query.count_messages` is therefore more
               appropriate (and much faster). While not guaranteeing
               that it will return the exact same number than len(),
               in my tests it effectively always did so.



-- 
Brian May <brian@microcomaustralia.com.au>

[-- Attachment #2: Type: text/html, Size: 1920 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [python] get all messages of a thread
  2011-06-01  5:35 ` Brian May
@ 2011-06-02  7:05   ` Sebastian Spaeth
  2011-06-02  9:43     ` Brian May
  0 siblings, 1 reply; 8+ messages in thread
From: Sebastian Spaeth @ 2011-06-02  7:05 UTC (permalink / raw)
  To: Brian May, Patrick Totzke; +Cc: notmuch

[-- Attachment #1: Type: text/plain, Size: 1148 bytes --]

On Wed, 1 Jun 2011 15:35:35 +1000, Brian May wrote:
> Oh, I see, for your code, there is a implied call to __len__, and the
> __len__ function is completely broken for the reasons described in the
> documentation:

It seems to have been a bad idea to implement __len__ at all for the
Messsages() construct in the python bindings, and I wonder if I should
remove it.

On the other hand, it seems that list(Messages()) implicitely calls
len(), or so it seems from the error that we get when trying to list() a
messages object.

An alternative is to implement len() as a call to count_messages() which
has for me so far always returned the correct number of messages without
using up the iterator. However, the xapian docs explicitely state that
it does not guarantee that the count will be correct, so len() might
return a wrong message size (potentially).

What would be the best way to solve this (besides fixing the C api to
allow to reset the iterator ;-) ?)

I could implement a custom .as_list() function that returns the
Messages() object as a list that is guaranteed to be stable, by copying
out the Message() objects into a list.

Sebastian

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [python] get all messages of a thread
  2011-06-02  7:05   ` Sebastian Spaeth
@ 2011-06-02  9:43     ` Brian May
  2011-06-02 14:20       ` Sebastian Spaeth
  0 siblings, 1 reply; 8+ messages in thread
From: Brian May @ 2011-06-02  9:43 UTC (permalink / raw)
  To: notmuch

[-- Attachment #1: Type: text/plain, Size: 1492 bytes --]

On 2 June 2011 17:05, Sebastian Spaeth <Sebastian@sspaeth.de> wrote:

> What would be the best way to solve this (besides fixing the C api to
> allow to reset the iterator ;-) ?)
>
>
 I am not really familiar with the code. So am I correct in making the
following assumptions?

* It is not easy to fix the C api to reset the iterator (what about
repeating the search?)

* The only accurate way to get the number of messages is to iterate through
every search result and count them?

If so, then len(...) I think might be very slow if there are a large number
of elements.

Maybe it might be easier/better to implement object.__nonzero__(self)
 instead of the object.__len__(self) method?

http://docs.python.org/reference/datamodel.html

object.__nonzero__(self)
Called to implement truth value testing and the built-in operation bool();
should return False or True, or their integer equivalents 0 or 1. When this
method is not defined, __len__() is called, if it is defined, and the object
is considered true if its result is nonzero. If a class defines neither
__len__() nor __nonzero__(), all its instances are considered true.

object.__len__(self)
Called to implement the built-in function len(). Should return the length of
the object, an integer >= 0. Also, an object that doesn’t define a
__nonzero__() method and whose __len__() method returns zero is considered
to be false in a Boolean context.

-- 
Brian May <brian@microcomaustralia.com.au>

[-- Attachment #2: Type: text/html, Size: 2088 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [python] get all messages of a thread
  2011-06-02  9:43     ` Brian May
@ 2011-06-02 14:20       ` Sebastian Spaeth
  2011-06-02 14:37         ` Sebastian Spaeth
  2011-06-02 14:38         ` Austin Clements
  0 siblings, 2 replies; 8+ messages in thread
From: Sebastian Spaeth @ 2011-06-02 14:20 UTC (permalink / raw)
  To: Brian May, notmuch

[-- Attachment #1: Type: text/plain, Size: 2467 bytes --]

On Thu, 2 Jun 2011 19:43:29 +1000, Brian May wrote:
> On 2 June 2011 17:05, Sebastian Spaeth <Sebastian@sspaeth.de> wrote:
> 
> > What would be the best way to solve this (besides fixing the C api to
> > allow to reset the iterator ;-) ?)

> * It is not easy to fix the C api to reset the iterator (what about
> repeating the search?)

I am not sure about the difficulty of that, I am not a C-kind of
guy. Repeating the search would be easy but potentially gives you
different results since the db could have changed since then.
 
> * The only accurate way to get the number of messages is to iterate through
> every search result and count them?

There is count_messages() which wraps notmuch_query_count_messages which
invokes some very quick xapian function. But the xapian docs explicitely
state that it's xapians best guess and not guaranteed to be the real
number of messages (although it always was in my attempts). And you
don't want len() to return an approximation of the iterator length....

> If so, then len(...) I think might be very slow if there are a large number
> of elements.
 
> Maybe it might be easier/better to implement object.__nonzero__(self)
>  instead of the object.__len__(self) method?
> 
> http://docs.python.org/reference/datamodel.html
> 
> object.__nonzero__(self)
> Called to implement truth value testing and the built-in operation bool();
> should return False or True, or their integer equivalents 0 or 1. When this
> method is not defined, __len__() is called, if it is defined, and the object
> is considered true if its result is nonzero. If a class defines neither
> __len__() nor __nonzero__(), all its instances are considered true.

Interesting, did not know about this one. I guess that would solve the
example with the:

t = query.search_threads()

if t:
  for thread in t:
    print thread

Actually the "if t:" is no longer needed, I just tried a query returning
no Threads and 

for thread in threads:
  pass

works just fine with an empty Threads() object.

I made the change, and implemented __nonzero__ and removed the len()
method. It just doesn't make sense on 1-time iterators. (I documented
the change in the API docs). Sorry if this breaks existing code.

list(Threads()) works just fine too, it just took a while to create a
list of 13k Thread() objects on this laptop. (and list() will of course
not return until it is finished).


Sebastian

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [python] get all messages of a thread
  2011-06-02 14:20       ` Sebastian Spaeth
@ 2011-06-02 14:37         ` Sebastian Spaeth
  2011-06-02 14:38         ` Austin Clements
  1 sibling, 0 replies; 8+ messages in thread
From: Sebastian Spaeth @ 2011-06-02 14:37 UTC (permalink / raw)
  To: Brian May, notmuch

[-- Attachment #1: Type: text/plain, Size: 585 bytes --]

On Thu, 02 Jun 2011 16:20:14 +0200, Sebastian Spaeth wrote:

> I made the change, and implemented __nonzero__ and removed the len()
> method. It just doesn't make sense on 1-time iterators. (I documented
> the change in the API docs). Sorry if this breaks existing code.

FYI

OK, I just pushed a change that adds the __nonzero__ function thus
making:

t=Threads()
if t:
  len(t)

work, but I did not pull the len() function out. The explicit "if t"
test is however not needed for

t = q.search_threads()
for thread in t:
  print thread

works just fine with empty results.

Sebastian

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [python] get all messages of a thread
  2011-06-02 14:20       ` Sebastian Spaeth
  2011-06-02 14:37         ` Sebastian Spaeth
@ 2011-06-02 14:38         ` Austin Clements
  1 sibling, 0 replies; 8+ messages in thread
From: Austin Clements @ 2011-06-02 14:38 UTC (permalink / raw)
  To: Sebastian Spaeth; +Cc: notmuch

On Thu, Jun 2, 2011 at 10:20 AM, Sebastian Spaeth <Sebastian@sspaeth.de> wrote:
> On Thu, 2 Jun 2011 19:43:29 +1000, Brian May wrote:
>> On 2 June 2011 17:05, Sebastian Spaeth <Sebastian@sspaeth.de> wrote:
>>
>> > What would be the best way to solve this (besides fixing the C api to
>> > allow to reset the iterator ;-) ?)
>
>> * It is not easy to fix the C api to reset the iterator (what about
>> repeating the search?)
>
> I am not sure about the difficulty of that, I am not a C-kind of
> guy. Repeating the search would be easy but potentially gives you
> different results since the db could have changed since then.

Not too hard.  Here's an utterly untested patch that implements
iterator resetting for notmuch_messages_t iterators.  It *should* be
much more efficient than performing the query again, but if you use
it, I'd love to know if that's actually true.

This may not be useful if __len__ is gone, unless you really want to
turn Messages/Threads into iterators rather than generators (as I've
pointed out before, there is absolutely nothing unusual or un-Pythonic
about how Messages/Threads works right now [well, except for the
presence of __len__ in a generator, I suppose]).

diff --git a/lib/messages.c b/lib/messages.c
index 7bcd1ab..085691c 100644
--- a/lib/messages.c
+++ b/lib/messages.c
@@ -80,7 +80,8 @@ _notmuch_messages_create (notmuch_message_list_t *list)
 	return NULL;

     messages->is_of_list_type = TRUE;
-    messages->iterator = list->head;
+    messages->head = list->head;
+    notmuch_messages_reset (messages);

     return messages;
 }
@@ -137,6 +138,15 @@ notmuch_messages_move_to_next (notmuch_messages_t
*messages)
 }

 void
+notmuch_messages_reset (notmuch_messages_t *messages)
+{
+    if (! messages->is_of_list_type)
+	return _notmuch_mset_messages_reset (messages);
+
+    messages->iterator = messages->head;
+}
+
+void
 notmuch_messages_destroy (notmuch_messages_t *messages)
 {
     talloc_free (messages);
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 02e24ee..805d60c 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -413,6 +413,7 @@ typedef struct _notmuch_message_list {
  */
 struct visible _notmuch_messages {
     notmuch_bool_t is_of_list_type;
+    notmuch_message_node_t *head;
     notmuch_message_node_t *iterator;
 };

@@ -441,6 +442,9 @@ _notmuch_mset_messages_get (notmuch_messages_t *messages);
 void
 _notmuch_mset_messages_move_to_next (notmuch_messages_t *messages);

+void
+_notmuch_mset_messages_reset (notmuch_messages_t *messages);
+
 notmuch_bool_t
 _notmuch_doc_id_set_contains (notmuch_doc_id_set_t *doc_ids,
                               unsigned int doc_id);
diff --git a/lib/notmuch.h b/lib/notmuch.h
index 9cdcec0..044cfaa 100644
--- a/lib/notmuch.h
+++ b/lib/notmuch.h
@@ -734,6 +734,15 @@ notmuch_messages_get (notmuch_messages_t *messages);
 void
 notmuch_messages_move_to_next (notmuch_messages_t *messages);

+/* Reset the 'messages' iterator back to the first message.
+ *
+ * For iterators returned from notmuch_query_search_messages, this is
+ * both more efficient than performing the query a second time and
+ * guaranteed to result in the same messages as the first iteration.
+ */
+void
+notmuch_messages_reset (notmuch_messages_t *messages);
+
 /* Destroy a notmuch_messages_t object.
  *
  * It's not strictly necessary to call this function. All memory from
diff --git a/lib/query.cc b/lib/query.cc
index 6f02b04..1e75be0 100644
--- a/lib/query.cc
+++ b/lib/query.cc
@@ -32,6 +32,7 @@ struct _notmuch_query {
 typedef struct _notmuch_mset_messages {
     notmuch_messages_t base;
     notmuch_database_t *notmuch;
+    Xapian::MSet mset;
     Xapian::MSetIterator iterator;
     Xapian::MSetIterator iterator_end;
 } notmuch_mset_messages_t;
@@ -128,6 +129,7 @@ notmuch_query_search_messages (notmuch_query_t *query)
 	messages->base.is_of_list_type = FALSE;
 	messages->base.iterator = NULL;
 	messages->notmuch = notmuch;
+	new (&messages->mset) Xapian::MSet ();
 	new (&messages->iterator) Xapian::MSetIterator ();
 	new (&messages->iterator_end) Xapian::MSetIterator ();

@@ -181,8 +183,8 @@ notmuch_query_search_messages (notmuch_query_t *query)

 	mset = enquire.get_mset (0, notmuch->xapian_db->get_doccount ());

-	messages->iterator = mset.begin ();
-	messages->iterator_end = mset.end ();
+	messages->mset = mset;
+	_notmuch_mset_messages_reset (&messages->base);

 	return &messages->base;

@@ -257,6 +259,17 @@ _notmuch_mset_messages_move_to_next
(notmuch_messages_t *messages)
     mset_messages->iterator++;
 }

+void
+_notmuch_mset_messages_reset (notmuch_messages_t *messages)
+{
+    notmuch_mset_messages_t *mset_messages;
+
+    mset_messages = (notmuch_mset_messages_t *) messages;
+
+    mset_messages->iterator = mset_messages->mset.begin ();
+    mset_messages->iterator_end = mset_messages->mset.end ();
+}
+
 static notmuch_bool_t
 _notmuch_doc_id_set_init (void *ctx,
 			  notmuch_doc_id_set_t *doc_ids,

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-06-02 14:38 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-28 13:18 [python] get all messages of a thread Patrick Totzke
2011-05-31 23:48 ` Carl Worth
2011-06-01  5:35 ` Brian May
2011-06-02  7:05   ` Sebastian Spaeth
2011-06-02  9:43     ` Brian May
2011-06-02 14:20       ` Sebastian Spaeth
2011-06-02 14:37         ` Sebastian Spaeth
2011-06-02 14:38         ` Austin Clements

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).