unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: David Bremner <david@tethera.net>
To: Notmuch Mail <notmuch@notmuchmail.org>
Subject: Re: experimental logging branch
Date: Sun, 11 Dec 2011 20:12:13 -0400	[thread overview]
Message-ID: <87liqik5r6.fsf@zancas.localnet> (raw)
In-Reply-To: <87obvfu6t5.fsf@zancas.localnet>

[-- Attachment #1: Type: text/plain, Size: 4312 bytes --]

On Sat, 10 Dec 2011 23:27:02 -0400, David Bremner <david@tethera.net> wrote:
> This uses the jlog library (http://labs.omniti.com/labs/jlog)
> to atomically log messages in pub-sub model.

Some more explanations. 

Part 1: pub-what?
=================

 What is pub-sub? Conceptually you can think of
it as a set of queues where each "published" message is magically
replicated and put in the queue of each "subscriber". Of course in
practice one only needs one queue, and to keep track of how far each
subscriber has read.  So, we have a queue, and one head pointer per
subscriber.  We can discard anything in the queue that all of the
subscribers have read past.

In jlog this data structure is on disk, which is why it is called
"durable". This means there are no sockets to communicate between the
publisher (notmuch) and the subscribers (e.g. a proposed tag synching
tool, described more below).  The interact via a directory (currently
under .notmuch). Because the datastructure on disk is not completely
trivial (not that fancy either, but more than a stream of bytes) both
writers and readers need to use the jlog library to interact with the
queue.  "notmuch log" is one such reader.

I'm not that invested in jlog, but I looked around and didn't find any
other similar solutions that had some atomicity guarantees without some
kind of broker (read yet another daemon running on the machine).  

> On this branch you can enable logging of tagging operations by 
> 
>    notmuch config set log.subscribers 'name1;name2;name3'

The command "notmuch log" lets one read the queue from the shell.
"notmuch log name1" dumps any messages (only the string content; there
are timestamps but these are currently ignored) queued for "name1" to
stdout. So one can interact with this queue without learning about the
jlog api (or more precisely, without copy-pasting the example programs
From the wiki like I did).

Part 2: Ok, but what is it good for?
====================================

OK, so there is this tool, but why should we bother?  I think tag
synchronization is one of the big missing pieces for notmuch (probably
because Carl only reads mail on one computer ;) ). There are various
hacks, but they are all based on dump/restore.  nmbug only manages to
have (mostly) acceptable performance because the query of "tags starting
with notmuch::" (done in a hacky way) restricts the output to manageable
levels.  I think what we need is a way to to update incrementally, and
obvious way to do this for tags is to keep track of additions and
deletions, and maintain a "shadow" of the database on disk in some form
more amenable to synchronization. This could be a directory/file tree
like the initial versions of nmbug, or some slightly fancier thing like
the bare git repo used by current versions of nmbug. With the jlog
patches to notmuch, one or more scripts could run (in cron, or perhaps
using something like inotify) to treat the log of tagging operations as
essentially a patch to update the "shadow tag database".  In my case I
would probably want two subscribers, one for my whole tag database, and
one to update the set published in nmbug.  

A more wild idea would be to use the queue to help resolve contention
for write access to the Xapian database. Clients would write into a
queue, and notmuch would read operations to perform out of the queue. To
be honest, I'm not sure this is really better than just having clients
use locking and blocking/retrying.  

Part 3: Couldn't we do this with hooks?
=======================================

Conceptually, yes. But there are a few things to figure out:

1) hooks are a CLI feature, not a library feature. Do we want the
   library to support something like hooks?

2) The cost of an exec per elementary tagging operation is quite high;
   maybe some kind of batching could help with this.

3) Atomicity/locking would need to be dealt with by each script.  For
   example git update-index, used by nmbug (and git add) will fail if
   some other operation is in progress. But having a hook block sounds a
   bit nightmarish.

None of this stuff is my area of expertise; maybe some of you have
clearer ideas about how this could/should be handled.

d






[-- Attachment #2: Type: application/pgp-signature, Size: 315 bytes --]

      reply	other threads:[~2011-12-12  0:12 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-11  3:27 experimental logging branch David Bremner
2011-12-12  0:12 ` David Bremner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87liqik5r6.fsf@zancas.localnet \
    --to=david@tethera.net \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).