unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: Patrick Totzke <patricktotzke@googlemail.com>
To: Austin Clements <amdragon@mit.edu>
Cc: notmuch@notmuchmail.org
Subject: Re: one-time-iterators
Date: Thu, 26 May 2011 22:47:42 +0100	[thread overview]
Message-ID: <1306446359-sup-9475@brick> (raw)
In-Reply-To: <BANLkTi=3mQYJft4s9jGaoqSbcJvqhmZXyQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4001 bytes --]

hehe, did it again (dropping the list from cc). I need to stop
using sup :P thanks Austin.

Excerpts from Carl Worth's message of Thu May 26 18:20:21 +0100 2011:
> On Thu, 26 May 2011 09:31:19 +0100, Patrick Totzke <patricktotzke@googlemail.com> wrote:
> > Wow. This reads really complicated. All I want to say is:
> > if I change tags in my search-results view, I get Xapian errors :)
> 
> Yes, that's frustrating. I wish that we had a more reliable interface at
> the notmuch library level. But I'm not entirely sure what would be the
> best way to do this.
Actually, I expected something like this. For this reason each sup instance 
locks its index.
At the moment I'm going for custom wrapper classes around notmuch.Thread
and notmuch.Messages that cache the result of the calls relevant for me.
But the real issue seems to be the iterator:
It takes an awful lot of time just to copy the thread ids of all threads from 
large a query result.

I tried the following in ipython:
 q=Database().create_query('*')
 time tids = [t.get_thread_id() for t in q.search_threads()]

which results in
CPU times: user 7.64 s, sys: 2.06 s, total: 9.70 s
Wall time: 9.84 s

It would really help if the Query object could return an iterator
of thread-ids that makes this copying unnecessary. Is it possible to
implement this? Or would this require the same amount of copying to happen
at a lower level?
I have not looked into the code for the bindings or the C code so far,
but I guess the Query.search_threads() translates to some 
"SELECT id,morestuff from threads"
where for me a "SELECT is from threads" would totally suffice. Copying 
(in the C code) only the ids so some list that yields an iterator should be faster.

> > The question: How do you solve this in the emacs code?
> > do you store all tids of a query? 
> 
> The emacs code does not use the notmuch library interface like your
> python bindings do. Instead, it uses the notmuch command-line tool, (and
> buffers up the text output by it). 
Ahh ok. Thanks for the explanation.

Excerpts from Austin Clements's message of Thu May 26 21:18:53 +0100 2011:
> I proposed a solution to this problem a while ago
> (id:"AANLkTi=KOx8aTJipkiArFVjEHE6zt_JypoASMiiAWBZ6@mail.gmail.com"),
> though I haven't tried implementing it yet.
Sorry, I wasn't on the list back then.

> Though, Patrick, that solution doesn't address your problem.  On the
> other hand, it's not clear to me what concurrent access semantics
> you're actually expecting.  I suspect you don't want the remaining
> iteration to reflect the changes, since your changes could equally
> well have affected earlier iteration results. 
That's right. 
> But if you want a
> consistent view of your query results, something's going to have to
> materialize that iterator, and it might as well be you (or Xapian
> would need more sophisticated concurrency control than it has).  But
> this shouldn't be expensive because all you need to materialize are
> the document ids; you shouldn't need to eagerly fetch the per-thread
> information.  
I thought so, but it seems that Query.search_threads() already
caches more than the id of each item. Which is as expected
because it is designed to return thread objects, not their ids.
As you can see above, this _is_ too expensive for me.

> Have you tried simply calling list() on your thread
> iterator to see how expensive it is?  My bet is that it's quite cheap,
> both memory-wise and CPU-wise.
Funny thing:
 q=Database().create_query('*')
 time tlist = list(q.search_threads())
raises a NotmuchError(STATUS.NOT_INITIALIZED) exception. For some reason
the list constructor must read mere than once from the iterator.
So this is not an option, but even if it worked, it would show
the same behaviour as my above test..

would it be very hard to implement a Query.search_thread_ids() ?
This name is a bit off because it had to be done on a lower level.
Cheers,
/p

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

  reply	other threads:[~2011-05-26 21:47 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-26  8:31 one-time-iterators Patrick Totzke
2011-05-26 17:20 ` one-time-iterators Carl Worth
2011-05-26 20:18   ` one-time-iterators Austin Clements
2011-05-26 21:47     ` Patrick Totzke [this message]
     [not found]     ` <1306442683-sup-9315@brick>
     [not found]       ` <20110526214302.GR29861@mit.edu>
2011-05-26 22:22         ` one-time-iterators Patrick Totzke
2011-05-27  2:41           ` one-time-iterators Austin Clements
2011-05-27 18:04             ` one-time-iterators Patrick Totzke
2011-05-27 19:29               ` one-time-iterators Austin Clements
2011-05-28  8:58                 ` one-time-iterators Patrick Totzke
2011-05-31  1:05                   ` one-time-iterators Austin Clements
2011-05-26 21:16   ` one-time-iterators Michael Hudson-Doyle

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1306446359-sup-9475@brick \
    --to=patricktotzke@googlemail.com \
    --cc=amdragon@mit.edu \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).