* [python] get all messages of a thread @ 2011-05-28 13:18 Patrick Totzke 2011-05-31 23:48 ` Carl Worth 2011-06-01 5:35 ` Brian May 0 siblings, 2 replies; 8+ messages in thread From: Patrick Totzke @ 2011-05-28 13:18 UTC (permalink / raw) To: notmuch [-- Attachment #1: Type: text/plain, Size: 1562 bytes --] Hi! I wonder how I would get all messages of a thread with the python bindings. The doc says one can only use Thread.get_toplevel_messages() and then must recursively call Message.get_replies(). But look: ---------------- snip ------------------------- #/usr/bin/python from notmuch import Database def get_thread(): query = Database().create_query('thread:0000000000000c48') return query.search_threads().next() def replies(msg): acc = [] r = msg.get_replies() if r: #because we cant iterate on NoneType for m in r: acc.append(m) acc += replies(m) return acc t=get_thread() msgs = [] for m in t.get_toplevel_messages(): msgs.append(m) msgs += replies(m) print msgs ------------------- snap -------------------- gives: Traceback (most recent call last): File "replies.py", line 22, in <module> msgs +=replies(m) File "replies.py", line 12, in replies for m in r: File "/usr/local/lib/python2.7/dist-packages/notmuch/message.py", line 140, in next raise NotmuchError(STATUS.NOT_INITIALIZED) notmuch.globals.NotmuchError: Operation on uninitialized object impossible. --------------------------------------------- It seems that nobody needed this before. Even in bindings/python/notmuch.py only Threads.get_toplevel_messages() gets called, and then a (undocumented) Messages.print_messages is used (cf line 639, in show) any suggestions? I would rather not call the notmuch binary and parse its output.. Thanks! /p [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [python] get all messages of a thread 2011-05-28 13:18 [python] get all messages of a thread Patrick Totzke @ 2011-05-31 23:48 ` Carl Worth 2011-06-01 5:35 ` Brian May 1 sibling, 0 replies; 8+ messages in thread From: Carl Worth @ 2011-05-31 23:48 UTC (permalink / raw) To: Patrick Totzke, notmuch [-- Attachment #1: Type: text/plain, Size: 555 bytes --] On Sat, 28 May 2011 14:18:05 +0100, Patrick Totzke <patricktotzke@googlemail.com> wrote: > It seems that nobody needed this before. Even in bindings/python/notmuch.py > only Threads.get_toplevel_messages() gets called, and then a (undocumented) > Messages.print_messages is used (cf line 639, in show) > > any suggestions? Looks like a bug in the bindings to me. > I would rather not call the notmuch binary and parse its output.. Of course not! Python folks, does someone have a quick fix here? -Carl -- carl.d.worth@intel.com [-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [python] get all messages of a thread 2011-05-28 13:18 [python] get all messages of a thread Patrick Totzke 2011-05-31 23:48 ` Carl Worth @ 2011-06-01 5:35 ` Brian May 2011-06-02 7:05 ` Sebastian Spaeth 1 sibling, 1 reply; 8+ messages in thread From: Brian May @ 2011-06-01 5:35 UTC (permalink / raw) To: Patrick Totzke; +Cc: notmuch [-- Attachment #1: Type: text/plain, Size: 1254 bytes --] On 28 May 2011 23:18, Patrick Totzke <patricktotzke@googlemail.com> wrote: > if r: #because we cant iterate on NoneType > I don't understand why, but this line sets r._msgs to None. So it crashes, because it has no message ids to look for. If you change it to if r is not None: ... then it works for me. Oh, I see, for your code, there is a implied call to __len__, and the __len__ function is completely broken for the reasons described in the documentation: .. note:: As this iterates over the messages, we will not be able to= iterate over them again! So this will fail:: #THIS FAILS msgs = Database().create_query('').search_message() if len(msgs) > 0: #this 'exhausts' msgs # next line raises NotmuchError(STATUS.NOT_INITIALIZED)!!! for msg in msgs: print msg Most of the time, using the :meth:`Query.count_messages` is therefore more appropriate (and much faster). While not guaranteeing that it will return the exact same number than len(), in my tests it effectively always did so. -- Brian May <brian@microcomaustralia.com.au> [-- Attachment #2: Type: text/html, Size: 1920 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [python] get all messages of a thread 2011-06-01 5:35 ` Brian May @ 2011-06-02 7:05 ` Sebastian Spaeth 2011-06-02 9:43 ` Brian May 0 siblings, 1 reply; 8+ messages in thread From: Sebastian Spaeth @ 2011-06-02 7:05 UTC (permalink / raw) To: Brian May, Patrick Totzke; +Cc: notmuch [-- Attachment #1: Type: text/plain, Size: 1148 bytes --] On Wed, 1 Jun 2011 15:35:35 +1000, Brian May wrote: > Oh, I see, for your code, there is a implied call to __len__, and the > __len__ function is completely broken for the reasons described in the > documentation: It seems to have been a bad idea to implement __len__ at all for the Messsages() construct in the python bindings, and I wonder if I should remove it. On the other hand, it seems that list(Messages()) implicitely calls len(), or so it seems from the error that we get when trying to list() a messages object. An alternative is to implement len() as a call to count_messages() which has for me so far always returned the correct number of messages without using up the iterator. However, the xapian docs explicitely state that it does not guarantee that the count will be correct, so len() might return a wrong message size (potentially). What would be the best way to solve this (besides fixing the C api to allow to reset the iterator ;-) ?) I could implement a custom .as_list() function that returns the Messages() object as a list that is guaranteed to be stable, by copying out the Message() objects into a list. Sebastian [-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [python] get all messages of a thread 2011-06-02 7:05 ` Sebastian Spaeth @ 2011-06-02 9:43 ` Brian May 2011-06-02 14:20 ` Sebastian Spaeth 0 siblings, 1 reply; 8+ messages in thread From: Brian May @ 2011-06-02 9:43 UTC (permalink / raw) To: notmuch [-- Attachment #1: Type: text/plain, Size: 1492 bytes --] On 2 June 2011 17:05, Sebastian Spaeth <Sebastian@sspaeth.de> wrote: > What would be the best way to solve this (besides fixing the C api to > allow to reset the iterator ;-) ?) > > I am not really familiar with the code. So am I correct in making the following assumptions? * It is not easy to fix the C api to reset the iterator (what about repeating the search?) * The only accurate way to get the number of messages is to iterate through every search result and count them? If so, then len(...) I think might be very slow if there are a large number of elements. Maybe it might be easier/better to implement object.__nonzero__(self) instead of the object.__len__(self) method? http://docs.python.org/reference/datamodel.html object.__nonzero__(self) Called to implement truth value testing and the built-in operation bool(); should return False or True, or their integer equivalents 0 or 1. When this method is not defined, __len__() is called, if it is defined, and the object is considered true if its result is nonzero. If a class defines neither __len__() nor __nonzero__(), all its instances are considered true. object.__len__(self) Called to implement the built-in function len(). Should return the length of the object, an integer >= 0. Also, an object that doesn’t define a __nonzero__() method and whose __len__() method returns zero is considered to be false in a Boolean context. -- Brian May <brian@microcomaustralia.com.au> [-- Attachment #2: Type: text/html, Size: 2088 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [python] get all messages of a thread 2011-06-02 9:43 ` Brian May @ 2011-06-02 14:20 ` Sebastian Spaeth 2011-06-02 14:37 ` Sebastian Spaeth 2011-06-02 14:38 ` Austin Clements 0 siblings, 2 replies; 8+ messages in thread From: Sebastian Spaeth @ 2011-06-02 14:20 UTC (permalink / raw) To: Brian May, notmuch [-- Attachment #1: Type: text/plain, Size: 2467 bytes --] On Thu, 2 Jun 2011 19:43:29 +1000, Brian May wrote: > On 2 June 2011 17:05, Sebastian Spaeth <Sebastian@sspaeth.de> wrote: > > > What would be the best way to solve this (besides fixing the C api to > > allow to reset the iterator ;-) ?) > * It is not easy to fix the C api to reset the iterator (what about > repeating the search?) I am not sure about the difficulty of that, I am not a C-kind of guy. Repeating the search would be easy but potentially gives you different results since the db could have changed since then. > * The only accurate way to get the number of messages is to iterate through > every search result and count them? There is count_messages() which wraps notmuch_query_count_messages which invokes some very quick xapian function. But the xapian docs explicitely state that it's xapians best guess and not guaranteed to be the real number of messages (although it always was in my attempts). And you don't want len() to return an approximation of the iterator length.... > If so, then len(...) I think might be very slow if there are a large number > of elements. > Maybe it might be easier/better to implement object.__nonzero__(self) > instead of the object.__len__(self) method? > > http://docs.python.org/reference/datamodel.html > > object.__nonzero__(self) > Called to implement truth value testing and the built-in operation bool(); > should return False or True, or their integer equivalents 0 or 1. When this > method is not defined, __len__() is called, if it is defined, and the object > is considered true if its result is nonzero. If a class defines neither > __len__() nor __nonzero__(), all its instances are considered true. Interesting, did not know about this one. I guess that would solve the example with the: t = query.search_threads() if t: for thread in t: print thread Actually the "if t:" is no longer needed, I just tried a query returning no Threads and for thread in threads: pass works just fine with an empty Threads() object. I made the change, and implemented __nonzero__ and removed the len() method. It just doesn't make sense on 1-time iterators. (I documented the change in the API docs). Sorry if this breaks existing code. list(Threads()) works just fine too, it just took a while to create a list of 13k Thread() objects on this laptop. (and list() will of course not return until it is finished). Sebastian [-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [python] get all messages of a thread 2011-06-02 14:20 ` Sebastian Spaeth @ 2011-06-02 14:37 ` Sebastian Spaeth 2011-06-02 14:38 ` Austin Clements 1 sibling, 0 replies; 8+ messages in thread From: Sebastian Spaeth @ 2011-06-02 14:37 UTC (permalink / raw) To: Brian May, notmuch [-- Attachment #1: Type: text/plain, Size: 585 bytes --] On Thu, 02 Jun 2011 16:20:14 +0200, Sebastian Spaeth wrote: > I made the change, and implemented __nonzero__ and removed the len() > method. It just doesn't make sense on 1-time iterators. (I documented > the change in the API docs). Sorry if this breaks existing code. FYI OK, I just pushed a change that adds the __nonzero__ function thus making: t=Threads() if t: len(t) work, but I did not pull the len() function out. The explicit "if t" test is however not needed for t = q.search_threads() for thread in t: print thread works just fine with empty results. Sebastian [-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [python] get all messages of a thread 2011-06-02 14:20 ` Sebastian Spaeth 2011-06-02 14:37 ` Sebastian Spaeth @ 2011-06-02 14:38 ` Austin Clements 1 sibling, 0 replies; 8+ messages in thread From: Austin Clements @ 2011-06-02 14:38 UTC (permalink / raw) To: Sebastian Spaeth; +Cc: notmuch On Thu, Jun 2, 2011 at 10:20 AM, Sebastian Spaeth <Sebastian@sspaeth.de> wrote: > On Thu, 2 Jun 2011 19:43:29 +1000, Brian May wrote: >> On 2 June 2011 17:05, Sebastian Spaeth <Sebastian@sspaeth.de> wrote: >> >> > What would be the best way to solve this (besides fixing the C api to >> > allow to reset the iterator ;-) ?) > >> * It is not easy to fix the C api to reset the iterator (what about >> repeating the search?) > > I am not sure about the difficulty of that, I am not a C-kind of > guy. Repeating the search would be easy but potentially gives you > different results since the db could have changed since then. Not too hard. Here's an utterly untested patch that implements iterator resetting for notmuch_messages_t iterators. It *should* be much more efficient than performing the query again, but if you use it, I'd love to know if that's actually true. This may not be useful if __len__ is gone, unless you really want to turn Messages/Threads into iterators rather than generators (as I've pointed out before, there is absolutely nothing unusual or un-Pythonic about how Messages/Threads works right now [well, except for the presence of __len__ in a generator, I suppose]). diff --git a/lib/messages.c b/lib/messages.c index 7bcd1ab..085691c 100644 --- a/lib/messages.c +++ b/lib/messages.c @@ -80,7 +80,8 @@ _notmuch_messages_create (notmuch_message_list_t *list) return NULL; messages->is_of_list_type = TRUE; - messages->iterator = list->head; + messages->head = list->head; + notmuch_messages_reset (messages); return messages; } @@ -137,6 +138,15 @@ notmuch_messages_move_to_next (notmuch_messages_t *messages) } void +notmuch_messages_reset (notmuch_messages_t *messages) +{ + if (! messages->is_of_list_type) + return _notmuch_mset_messages_reset (messages); + + messages->iterator = messages->head; +} + +void notmuch_messages_destroy (notmuch_messages_t *messages) { talloc_free (messages); diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h index 02e24ee..805d60c 100644 --- a/lib/notmuch-private.h +++ b/lib/notmuch-private.h @@ -413,6 +413,7 @@ typedef struct _notmuch_message_list { */ struct visible _notmuch_messages { notmuch_bool_t is_of_list_type; + notmuch_message_node_t *head; notmuch_message_node_t *iterator; }; @@ -441,6 +442,9 @@ _notmuch_mset_messages_get (notmuch_messages_t *messages); void _notmuch_mset_messages_move_to_next (notmuch_messages_t *messages); +void +_notmuch_mset_messages_reset (notmuch_messages_t *messages); + notmuch_bool_t _notmuch_doc_id_set_contains (notmuch_doc_id_set_t *doc_ids, unsigned int doc_id); diff --git a/lib/notmuch.h b/lib/notmuch.h index 9cdcec0..044cfaa 100644 --- a/lib/notmuch.h +++ b/lib/notmuch.h @@ -734,6 +734,15 @@ notmuch_messages_get (notmuch_messages_t *messages); void notmuch_messages_move_to_next (notmuch_messages_t *messages); +/* Reset the 'messages' iterator back to the first message. + * + * For iterators returned from notmuch_query_search_messages, this is + * both more efficient than performing the query a second time and + * guaranteed to result in the same messages as the first iteration. + */ +void +notmuch_messages_reset (notmuch_messages_t *messages); + /* Destroy a notmuch_messages_t object. * * It's not strictly necessary to call this function. All memory from diff --git a/lib/query.cc b/lib/query.cc index 6f02b04..1e75be0 100644 --- a/lib/query.cc +++ b/lib/query.cc @@ -32,6 +32,7 @@ struct _notmuch_query { typedef struct _notmuch_mset_messages { notmuch_messages_t base; notmuch_database_t *notmuch; + Xapian::MSet mset; Xapian::MSetIterator iterator; Xapian::MSetIterator iterator_end; } notmuch_mset_messages_t; @@ -128,6 +129,7 @@ notmuch_query_search_messages (notmuch_query_t *query) messages->base.is_of_list_type = FALSE; messages->base.iterator = NULL; messages->notmuch = notmuch; + new (&messages->mset) Xapian::MSet (); new (&messages->iterator) Xapian::MSetIterator (); new (&messages->iterator_end) Xapian::MSetIterator (); @@ -181,8 +183,8 @@ notmuch_query_search_messages (notmuch_query_t *query) mset = enquire.get_mset (0, notmuch->xapian_db->get_doccount ()); - messages->iterator = mset.begin (); - messages->iterator_end = mset.end (); + messages->mset = mset; + _notmuch_mset_messages_reset (&messages->base); return &messages->base; @@ -257,6 +259,17 @@ _notmuch_mset_messages_move_to_next (notmuch_messages_t *messages) mset_messages->iterator++; } +void +_notmuch_mset_messages_reset (notmuch_messages_t *messages) +{ + notmuch_mset_messages_t *mset_messages; + + mset_messages = (notmuch_mset_messages_t *) messages; + + mset_messages->iterator = mset_messages->mset.begin (); + mset_messages->iterator_end = mset_messages->mset.end (); +} + static notmuch_bool_t _notmuch_doc_id_set_init (void *ctx, notmuch_doc_id_set_t *doc_ids, ^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-06-02 14:38 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-05-28 13:18 [python] get all messages of a thread Patrick Totzke 2011-05-31 23:48 ` Carl Worth 2011-06-01 5:35 ` Brian May 2011-06-02 7:05 ` Sebastian Spaeth 2011-06-02 9:43 ` Brian May 2011-06-02 14:20 ` Sebastian Spaeth 2011-06-02 14:37 ` Sebastian Spaeth 2011-06-02 14:38 ` Austin Clements
Code repositories for project(s) associated with this public inbox https://yhetil.org/notmuch.git/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).