unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: David Bremner <david@tethera.net>
To: Daniel Kahn Gillmor <dkg@fifthhorseman.net>, notmuch@notmuchmail.org
Subject: Re: [RFC2 Patch 5/5] lib: iterator API for message properties
Date: Wed, 01 Jun 2016 20:29:59 -0300	[thread overview]
Message-ID: <87lh2ofpxk.fsf@zancas.localnet> (raw)
In-Reply-To: <87eg8ht2sb.fsf@alice.fifthhorseman.net>

Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes:

> On Tue 2016-05-31 21:52:06 -0400, Daniel Kahn Gillmor wrote:
>
> do we actually need this abstraction?  If we're aiming to build specific
> new features (the two i'm thinking of are cryptographic-session-keys and
> reference-adjustments), couldn't we implement those features explicitly
> in xapian with their own special prefix, rather than treating them as a
> generic "property"?

Sure, it's certainly possible.

I guess if you don't care about the possibility of iterating all pairs
with given key prefix (which I admit makes more sense for the config
API), then the code could be simplified to look more like the tag list
handling code.  C is pretty crap at generics, but I guess looking at
tags.c, it's really about iterators for notmuch_string_list_t. So it
could probably be generalized to serve here.

For each such prefix, one would need to roughly duplicate patches 1/5
and 3/5.  It took me a little while to figure 1/5 out, but now that I
know, it would be less trouble.  I guess my thinking here was that I
would provide a low level interface that people using the C API or
bindings could use without hacking xapian.

> We already have a bit of an uncomfortable fit with tags and special
> flags (encrypted, signed, attachment, etc), where some are expected to
> be set and cleared automagically and some are expected to be manipulated
> directly by the user.  Are we setting ourselves up for more of the same,
> or is there a principled way that a user can know which properties it's
> kosher for them to set and clear, and which ones they should leave
> alone?

XPROPERTY is an internal prefix, which means it isn't added to the query
parser.  As it happens, I didn't plan on CLI access to these terms
either. Both of those choices are tradeoffs to say that these are
internal metadata, suitable for manipulation by programs. Such programs
could be scripts using python or ruby.

> If we add new specific features, we could potentially augment the dump
> format explicitly for them, without having the property abstraction.

We could, but I think should change the dump format quite rarely, since
we risk breaking people's scripts. So if we did it for one prefix, I'd
like to do in an extensible way so that adding new prefixes is somewhat
transparent. It also means some duplication of effort/code in notmuch
dump/restore to dump/restore each new prefix.

It's probably true that per-prefix dump format would be more compact,
since the keys would be implicit, rather than repeated for every pair.

> We already have some explicit features for each message (subject,
> from, to, attachment, mimetype, thread id, etc), and most of them are
> derived from the message itself, with the hope that it could be
> re-derived given just the message body.  Is there a distinction
> between properties that can be derived from the message body and
> properties that need to be additionally derived from some other data?

As Tomi always says, naming is the hardest thing; properties is a bit
generic. I'm not sure the distinction you make between the "message" and
the "message body" here. I think most of our derived terms are from the
message header.  My intent here is that "properties" are used for things
that cannot be derived from the message (header or body).

TL;DR:

     - per prefix requires new code in the library and dump/restore
       for every prefix
     + the dump format might be more compact if done in a per prefix way.
     + this code would be simpler than the generic properties code,
       mainly because it would not need key value pairs,
     - the library and dump/restore are parts of notmuch that have the
       potential to "break the world".  Not too many people are
       comfortable hacking on them.
     - changing the dump format is something like an ABI change for
       people whose scripts rely on dump / restore.

  reply	other threads:[~2016-06-01 23:30 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-22 14:28 RFC: message property API David Bremner
2016-05-22 14:28 ` [RFC patch 1/2] lib: refactor _notmuch_message_has_term David Bremner
2016-05-22 14:28 ` [RFC patch 2/2] RFC message-property API David Bremner
2016-05-30 11:49 ` message properties, round 2 David Bremner
2016-05-30 11:49   ` [RFC2 Patch 1/5] lib: read "property" terms from messages David Bremner
2016-05-30 11:49   ` [RFC2 Patch 2/5] lib: private string map (associative array) API David Bremner
2016-05-30 11:49   ` [RFC2 Patch 3/5] lib: basic message-property API David Bremner
2016-05-30 11:49   ` [RFC2 Patch 4/5] lib: extend private string map API with iterators David Bremner
2016-05-30 11:49   ` [RFC2 Patch 5/5] lib: iterator API for message properties David Bremner
2016-06-01  1:12     ` David Bremner
2016-06-01  1:52       ` Daniel Kahn Gillmor
2016-06-01  5:04         ` Tomi Ollila
2016-06-01 10:04         ` David Bremner
2016-06-01 14:13         ` Daniel Kahn Gillmor
2016-06-01 23:29           ` David Bremner [this message]
2016-06-02 17:33             ` Daniel Kahn Gillmor
2016-06-03 12:54               ` David Bremner
2016-06-03 14:38                 ` Daniel Kahn Gillmor
2016-06-03 23:12                   ` David Bremner
2016-06-04 16:23                     ` Daniel Kahn Gillmor
2016-06-05 10:24                   ` [PATCH] doc: document notmuch-dump header line David Bremner
2016-06-05 22:23                     ` David Bremner
2016-06-06  6:38                       ` Tomi Ollila
2016-06-07 10:55                       ` David Bremner
2016-06-01  4:38       ` [RFC2 Patch 5/5] lib: iterator API for message properties Tomi Ollila

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lh2ofpxk.fsf@zancas.localnet \
    --to=david@tethera.net \
    --cc=dkg@fifthhorseman.net \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).