unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
* experimental logging branch
@ 2011-12-11  3:27 David Bremner
  2011-12-12  0:12 ` David Bremner
  0 siblings, 1 reply; 2+ messages in thread
From: David Bremner @ 2011-12-11  3:27 UTC (permalink / raw)
  To: Notmuch Mail

[-- Attachment #1: Type: text/plain, Size: 1743 bytes --]


I have pushed a branch jlog 

  git://pivot.cs.unb.ca/notmuch.git

This uses the jlog library (http://labs.omniti.com/labs/jlog)
to atomically log messages in pub-sub model.

On this branch you can enable logging of tagging operations by 

   notmuch config set log.subscribers 'name1;name2;name3'

the "subscriber names" identify clients (one could imagine one client
deals with general tag syncing, another with nmbug, or so).
  
One can read the log via the jlog api (there are perl bindings) or via
the "notmuch log" command. 

    notmuch log name1 

will print all of the log messages since the last time the queue was
read.

for e.g., tag removal, these lines look like

T-1322702130-26068-3-git-send-email-bremner@debian.org test
T-1322702130-26068-1-git-send-email-bremner@debian.org test
T-7fbe6befcf31881a9bca672f55b93501249a220c.1322859389.git.jani@nikula.org test
T-716da00e176e1dc6af0ba248caee40acee733120.1322859389.git.jani@nikula.org test

An example application would be a cron job that writes all tag changes
to some nmbug like git repo.

There are a few rough edges and I expect to rebase the branch a bit
more, but feel free to have a look at it.  I guess I'm curious if people
think jlog is an acceptable dependency, and whether the whole logging
thing seems worthwhile. 

I did some preliminary tests, and for adding or deleting a tag to 200k
messages, this added about a 5% time penalty. I didn't notice any really
heavy I/O (no SSD here, but i7 cpu). Tests were roughly 95% CPU bound.

There are some preliminary debian packages for jlog at 

      git://pivot.cs.unb.ca/jlog.git

The upstream source for jlog is at 
    
    https://github.com/omniti-labs/jlog




[-- Attachment #2: Type: application/pgp-signature, Size: 315 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: experimental logging branch
  2011-12-11  3:27 experimental logging branch David Bremner
@ 2011-12-12  0:12 ` David Bremner
  0 siblings, 0 replies; 2+ messages in thread
From: David Bremner @ 2011-12-12  0:12 UTC (permalink / raw)
  To: Notmuch Mail

[-- Attachment #1: Type: text/plain, Size: 4312 bytes --]

On Sat, 10 Dec 2011 23:27:02 -0400, David Bremner <david@tethera.net> wrote:
> This uses the jlog library (http://labs.omniti.com/labs/jlog)
> to atomically log messages in pub-sub model.

Some more explanations. 

Part 1: pub-what?
=================

 What is pub-sub? Conceptually you can think of
it as a set of queues where each "published" message is magically
replicated and put in the queue of each "subscriber". Of course in
practice one only needs one queue, and to keep track of how far each
subscriber has read.  So, we have a queue, and one head pointer per
subscriber.  We can discard anything in the queue that all of the
subscribers have read past.

In jlog this data structure is on disk, which is why it is called
"durable". This means there are no sockets to communicate between the
publisher (notmuch) and the subscribers (e.g. a proposed tag synching
tool, described more below).  The interact via a directory (currently
under .notmuch). Because the datastructure on disk is not completely
trivial (not that fancy either, but more than a stream of bytes) both
writers and readers need to use the jlog library to interact with the
queue.  "notmuch log" is one such reader.

I'm not that invested in jlog, but I looked around and didn't find any
other similar solutions that had some atomicity guarantees without some
kind of broker (read yet another daemon running on the machine).  

> On this branch you can enable logging of tagging operations by 
> 
>    notmuch config set log.subscribers 'name1;name2;name3'

The command "notmuch log" lets one read the queue from the shell.
"notmuch log name1" dumps any messages (only the string content; there
are timestamps but these are currently ignored) queued for "name1" to
stdout. So one can interact with this queue without learning about the
jlog api (or more precisely, without copy-pasting the example programs
From the wiki like I did).

Part 2: Ok, but what is it good for?
====================================

OK, so there is this tool, but why should we bother?  I think tag
synchronization is one of the big missing pieces for notmuch (probably
because Carl only reads mail on one computer ;) ). There are various
hacks, but they are all based on dump/restore.  nmbug only manages to
have (mostly) acceptable performance because the query of "tags starting
with notmuch::" (done in a hacky way) restricts the output to manageable
levels.  I think what we need is a way to to update incrementally, and
obvious way to do this for tags is to keep track of additions and
deletions, and maintain a "shadow" of the database on disk in some form
more amenable to synchronization. This could be a directory/file tree
like the initial versions of nmbug, or some slightly fancier thing like
the bare git repo used by current versions of nmbug. With the jlog
patches to notmuch, one or more scripts could run (in cron, or perhaps
using something like inotify) to treat the log of tagging operations as
essentially a patch to update the "shadow tag database".  In my case I
would probably want two subscribers, one for my whole tag database, and
one to update the set published in nmbug.  

A more wild idea would be to use the queue to help resolve contention
for write access to the Xapian database. Clients would write into a
queue, and notmuch would read operations to perform out of the queue. To
be honest, I'm not sure this is really better than just having clients
use locking and blocking/retrying.  

Part 3: Couldn't we do this with hooks?
=======================================

Conceptually, yes. But there are a few things to figure out:

1) hooks are a CLI feature, not a library feature. Do we want the
   library to support something like hooks?

2) The cost of an exec per elementary tagging operation is quite high;
   maybe some kind of batching could help with this.

3) Atomicity/locking would need to be dealt with by each script.  For
   example git update-index, used by nmbug (and git add) will fail if
   some other operation is in progress. But having a hook block sounds a
   bit nightmarish.

None of this stuff is my area of expertise; maybe some of you have
clearer ideas about how this could/should be handled.

d






[-- Attachment #2: Type: application/pgp-signature, Size: 315 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-12-12  0:12 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-11  3:27 experimental logging branch David Bremner
2011-12-12  0:12 ` David Bremner

Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).