* Idea for storing tags
@ 2010-01-11 22:19 martin f krafft
2010-01-12 3:44 ` Scott Robinson
` (3 more replies)
0 siblings, 4 replies; 42+ messages in thread
From: martin f krafft @ 2010-01-11 22:19 UTC (permalink / raw)
To: mailtags discussion list; +Cc: notmuch discussion list
[-- Attachment #1: Type: text/plain, Size: 2725 bytes --]
Folks, over in #notmuch, we just floated an idea that I'd like to
get out to you. We've been debating storing tags for messages.
Therefore I am cross-posting. Please forgive me.
So far, there are two approaches:
1. External database, which has the downside of not being
synchronisable with standard IMAP, like the rest of your mail
(assuming you use IMAP). Also, it's possible for mailstore and
database to get out of sync.
2. In-headers, which has the downside of leaking (e.g. when
bouncing), and incurs the risks associated with message rewrites
(which I think is pretty much ignorable, but it's still there).
Also, there's a performance issue, but in the context of an
indexer like notmuch, this is negligible.
The leakage is real, though and I think it makes in-headers
unusable. After all, I don't ever want anyone else to know that
I tag e-mails from my boss as "from-idiots", and I forward and
bounce mail on a regular basis. I could tell my MTA to remove
those headers, but I might forget to do that on a new system.
We also previously determined that IMAP keywords are pretty much
useless as they are stored per mailbox, not per message, not
standardised, and limited in their length anyway [0]. This also
means that we don't really need to investigate sensibly storing tags
in Maildir (e.g. with xattrs), because IMAP cannot transport them.
0. http://lists.madduck.net/pipermail/mailtags/2007-August/msg00016.html
Seriously, who implemented IMAPv4rev1 and what sort of crack were
they smoking??
I remember there was some KDE groupware contacts manager that used
IMAP to synchronise contacts. At first, this sounds horrible, but
when you detach IMAP from RFC822, it becomes a generic synchronising
protocol. The next step is then straight forward, and I want to
share this idea with you:
How about using pseudo-mails stored in Maildir and synchronised by
IMAP? E.g. every folder could have a subfolder .TAGS and if we find
a way to smartly pair messages between parent and subfolder, we'd
have a tag store alongside the mailstore it refers to, but without
the danger of leakage, and without having to rewrite messages.
The major problem with this is when clients don't understand this
"protocol", for then they will display all .TAGS folders as regular
IMAP folders, and try to treat the messages therein as regular
mails. Somewhere sometime this is bound to blow up and I don't
really know how to prevent that.
Anyway, the idea is out now. Thoughts?
--
martin | http://madduck.net/ | http://two.sentenc.es/
echo Prpv a\'rfg cnf har cvcr | tr Pacfghnrvp Cnpstuaeic
spamtraps: madduck.bogus@madduck.net
[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Idea for storing tags
2010-01-11 22:19 Idea for storing tags martin f krafft
@ 2010-01-12 3:44 ` Scott Robinson
2010-01-12 4:06 ` martin f krafft
2010-01-12 4:51 ` Potential problem using Git for mail (was: Idea for storing tags) martin f krafft
2010-01-12 4:11 ` Idea for storing tags Scott Morrison
` (2 subsequent siblings)
3 siblings, 2 replies; 42+ messages in thread
From: Scott Robinson @ 2010-01-12 3:44 UTC (permalink / raw)
To: notmuch
I wrote a script to store and sync my tags.
* One filename per message-ID.
* Line-feed seperated tags in each file.
Then the whole structure is controlled via git. Conflict-resolution and sync
comes for free.
It isn't clear what use-case the earlier e-mail is aiming to satisfy. This is
how I solved my tag sync issues, though.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Idea for storing tags
2010-01-12 3:44 ` Scott Robinson
@ 2010-01-12 4:06 ` martin f krafft
2010-01-12 4:51 ` Potential problem using Git for mail (was: Idea for storing tags) martin f krafft
1 sibling, 0 replies; 42+ messages in thread
From: martin f krafft @ 2010-01-12 4:06 UTC (permalink / raw)
To: notmuch; +Cc: mailtags discussion list
[-- Attachment #1: Type: text/plain, Size: 969 bytes --]
also sprach Scott Robinson <scott@quadhome.com> [2010.01.12.1644 +1300]:
> I wrote a script to store and sync my tags.
>
> * One filename per message-ID.
> * Line-feed seperated tags in each file.
>
> Then the whole structure is controlled via git.
> Conflict-resolution and sync comes for free.
How do you ensure that the external tag store and your mail store do
not go out of sync? I assume that mails without a tagfile are simply
untagged, so that's hardly the issue. However, if you delete a mail,
how do you ensure that the tag database is cleaned up?
Also, do you attach tags automatically, e.g. with procmail on the
server? If so, how do you initiate git-pull locally?
Would you consider sharing your script?
--
martin | http://madduck.net/ | http://two.sentenc.es/
"alle vorurteile kommen aus den eingeweiden."
- friedrich nietzsche
spamtraps: madduck.bogus@madduck.net
[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Idea for storing tags
2010-01-11 22:19 Idea for storing tags martin f krafft
2010-01-12 3:44 ` Scott Robinson
@ 2010-01-12 4:11 ` Scott Morrison
2010-01-13 1:24 ` martin f krafft
2010-01-12 21:39 ` David A. Harding
2010-01-14 1:32 ` Carl Worth
3 siblings, 1 reply; 42+ messages in thread
From: Scott Morrison @ 2010-01-12 4:11 UTC (permalink / raw)
To: mailtags discussion list; +Cc: notmuch discussion list
Thought you would be interested in my experiences and thoughts from actually doing this kind of stuff.
With my software MailTags (www.indev.ca/MailTags.html) and I have looked at all these options and decided to go with storing tags in headers (in json formatted data for the X-MailTags header)
I have thought seriously about using pseudo emails stored in a specially named directory but feel there are a couple of issues with this.
1. synchronization of tag data with emails -- if they are in a subfolder then it presents the issue of maintaining this subfolder when managing emails (moving, deleting, duplicating etc) and any .tag folder unaware clients are likely cause an breakage in tagdata/message association. One way of doing this is to have a global .tag folder.
2. what happens if that message is archived or moved to an exclusively local cache -- eg. Mail.app on OS X can easily move IMAP messages to a folder resident on the computers computers? --
3. what happens with duplicates of emails -- I would assume that the message id would be the key to match the tag data to the message. In this system a duplicate of a message could not have a different set of tags from the original (not that this would necessarily be desirable.)
As I mentioned, I went with tags in headers -- though this has its own drawbacks.
Your mention of potential leakage (aka inadvertent disclosure of tag data) is real -- but only if the client used to bounce/forward is not the one to tag the message (one would assume that if a client can tag, it can know to exclude the tags in a bounce.) Mail.app -- which I am pluging into does not forward headers -- though it will include all headers in a bounce -- but chance are you aren't tagging messages you are bouncing.:)
The performance issue is very real -- because it means that somehow messages have to rewritten to the IMAP server -- IMAP doesn't have a mechanism AFAIK for updates. Additionally, IMAP doesn't have a mechanism for simply replacing one message data with another -- a new message must be written and the old message must be deleted and the message IMAP UID will change, and the client will have to deal with this especially if it is cache the messages.
Also GMAIL IMAP is an issue- gmail IMAP is not IMAP -- it simply doesn't work like a true imap server -- writes to folders in gmail IMAP are translated to database updates where it is attributing a single record of the message with the folder it was "written" to. Changing headers on a gmail IMAP message simply will not work because it will will reject the message as update of the single record (and not actually write the new data).
Still tags in headers meant that I didn't have to worry about making sure that the .tags folder is maintained appropriate (throughout moves and deletions) and that the data is stored much closer to the message for data recovery if it is ever needed and for archiving tags. -- in anycase -- this is what I have working -- though I am open to considering new approaches.
Scott
ps.
also see my post to the mailtags-list from a few years back
http://lists.madduck.net/pipermail/mailtags/2007-August/msg00017.html
On 2010-01-11, at 5:19 PM, martin f krafft wrote:
> Folks, over in #notmuch, we just floated an idea that I'd like to
> get out to you. We've been debating storing tags for messages.
> Therefore I am cross-posting. Please forgive me.
>
> So far, there are two approaches:
>
> 1. External database, which has the downside of not being
> synchronisable with standard IMAP, like the rest of your mail
> (assuming you use IMAP). Also, it's possible for mailstore and
> database to get out of sync.
>
> 2. In-headers, which has the downside of leaking (e.g. when
> bouncing), and incurs the risks associated with message rewrites
> (which I think is pretty much ignorable, but it's still there).
> Also, there's a performance issue, but in the context of an
> indexer like notmuch, this is negligible.
>
> The leakage is real, though and I think it makes in-headers
> unusable. After all, I don't ever want anyone else to know that
> I tag e-mails from my boss as "from-idiots", and I forward and
> bounce mail on a regular basis. I could tell my MTA to remove
> those headers, but I might forget to do that on a new system.
>
> We also previously determined that IMAP keywords are pretty much
> useless as they are stored per mailbox, not per message, not
> standardised, and limited in their length anyway [0]. This also
> means that we don't really need to investigate sensibly storing tags
> in Maildir (e.g. with xattrs), because IMAP cannot transport them.
>
> 0. http://lists.madduck.net/pipermail/mailtags/2007-August/msg00016.html
>
> Seriously, who implemented IMAPv4rev1 and what sort of crack were
> they smoking??
>
> I remember there was some KDE groupware contacts manager that used
> IMAP to synchronise contacts. At first, this sounds horrible, but
> when you detach IMAP from RFC822, it becomes a generic synchronising
> protocol. The next step is then straight forward, and I want to
> share this idea with you:
>
> How about using pseudo-mails stored in Maildir and synchronised by
> IMAP? E.g. every folder could have a subfolder .TAGS and if we find
> a way to smartly pair messages between parent and subfolder, we'd
> have a tag store alongside the mailstore it refers to, but without
> the danger of leakage, and without having to rewrite messages.
>
> The major problem with this is when clients don't understand this
> "protocol", for then they will display all .TAGS folders as regular
> IMAP folders, and try to treat the messages therein as regular
> mails. Somewhere sometime this is bound to blow up and I don't
> really know how to prevent that.
>
> Anyway, the idea is out now. Thoughts?
>
> --
> martin | http://madduck.net/ | http://two.sentenc.es/
>
> echo Prpv a\'rfg cnf har cvcr | tr Pacfghnrvp Cnpstuaeic
>
> spamtraps: madduck.bogus@madduck.net
> _______________________________________________
> mailtags mailing list
> mailtags@lists.madduck.net
> http://lists.madduck.net/listinfo/mailtags
^ permalink raw reply [flat|nested] 42+ messages in thread
* Potential problem using Git for mail (was: Idea for storing tags)
2010-01-12 3:44 ` Scott Robinson
2010-01-12 4:06 ` martin f krafft
@ 2010-01-12 4:51 ` martin f krafft
2010-01-12 19:38 ` Jameson Rollins
2010-01-14 8:12 ` Asheesh Laroia
1 sibling, 2 replies; 42+ messages in thread
From: martin f krafft @ 2010-01-12 4:51 UTC (permalink / raw)
To: notmuch
[-- Attachment #1: Type: text/plain, Size: 1438 bytes --]
also sprach Scott Robinson <scott@quadhome.com> [2010.01.12.1644 +1300]:
> Then the whole structure is controlled via git.
> Conflict-resolution and sync comes for free.
I've just had a good think about this, also because the idea of
abandoning IMAP and using Git has been around for a while and
I have not really wrapped my head around it.
If the MDA delivers to Git, then potentially, you might get into
a situation where you cannot write your own changes back to the
repo. This is also a DoS scenario: I'll just keep sending you
e-mail, and if I manage to pass your mail filters, I'll basically
commit to your mail repository at regular intervals. Say those are
5 seconds. In order for you to write updates to the repo, e.g. to
update tags, then you would need to pull, rebase, and push all
within 5 seconds, for otherwise you'd try to push non-fast-forwards.
This a bit unrealistic, surely, but there's a real annoyance in it:
you'd have to pull/rebase/push until a push succeeds — until you
found a time window between pull and push during which the MDA
didn't write to the repo. This might take a long time. If this
happens in the background by Cron, it's not a real concern, but if
this becomes a UI issue, I wouldn't know how to handle it.
--
martin | http://madduck.net/ | http://two.sentenc.es/
don't hate yourself in the morning -- sleep till noon.
spamtraps: madduck.bogus@madduck.net
[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Potential problem using Git for mail (was: Idea for storing tags)
2010-01-12 4:51 ` Potential problem using Git for mail (was: Idea for storing tags) martin f krafft
@ 2010-01-12 19:38 ` Jameson Rollins
2010-01-12 19:55 ` martin f krafft
2010-01-14 8:12 ` Asheesh Laroia
1 sibling, 1 reply; 42+ messages in thread
From: Jameson Rollins @ 2010-01-12 19:38 UTC (permalink / raw)
To: notmuch
[-- Attachment #1: Type: text/plain, Size: 1243 bytes --]
On Tue, Jan 12, 2010 at 05:51:53PM +1300, martin f krafft wrote:
> If the MDA delivers to Git, then potentially, you might get into
> a situation where you cannot write your own changes back to the
> repo. This is also a DoS scenario: I'll just keep sending you
> e-mail, and if I manage to pass your mail filters, I'll basically
> commit to your mail repository at regular intervals. Say those are
> 5 seconds. In order for you to write updates to the repo, e.g. to
> update tags, then you would need to pull, rebase, and push all
> within 5 seconds, for otherwise you'd try to push non-fast-forwards.
>
> This a bit unrealistic, surely, but there's a real annoyance in it:
> you'd have to pull/rebase/push until a push succeeds — until you
> found a time window between pull and push during which the MDA
> didn't write to the repo. This might take a long time. If this
> happens in the background by Cron, it's not a real concern, but if
> this becomes a UI issue, I wouldn't know how to handle it.
What about if just the tag information is stored in the repository,
and not the mail itself? In that case only the user would be pushing
into the repo and you wouldn't have to worry about the DoS scenario.
jamie.
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Potential problem using Git for mail (was: Idea for storing tags)
2010-01-12 19:38 ` Jameson Rollins
@ 2010-01-12 19:55 ` martin f krafft
0 siblings, 0 replies; 42+ messages in thread
From: martin f krafft @ 2010-01-12 19:55 UTC (permalink / raw)
To: Jameson Rollins; +Cc: notmuch
[-- Attachment #1: Type: text/plain, Size: 561 bytes --]
also sprach Jameson Rollins <jrollins@finestructure.net> [2010.01.13.0838 +1300]:
> What about if just the tag information is stored in the
> repository, and not the mail itself? In that case only the user
> would be pushing into the repo and you wouldn't have to worry
> about the DoS scenario.
I certainly would like the ability to have messages
automatically-tagged on delivery, by procmail.
--
martin | http://madduck.net/ | http://two.sentenc.es/
may the bluebird of happiness twiddle your bits.
spamtraps: madduck.bogus@madduck.net
[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Idea for storing tags
2010-01-11 22:19 Idea for storing tags martin f krafft
2010-01-12 3:44 ` Scott Robinson
2010-01-12 4:11 ` Idea for storing tags Scott Morrison
@ 2010-01-12 21:39 ` David A. Harding
2010-01-14 1:32 ` Carl Worth
3 siblings, 0 replies; 42+ messages in thread
From: David A. Harding @ 2010-01-12 21:39 UTC (permalink / raw)
To: martin f krafft
On Tue, Jan 12, 2010 at 11:19:09AM +1300, martin f krafft wrote:
> I think [tag leakage] it makes in-headers unusable. After all, I don't
> ever want anyone else to know that I tag e-mails from my boss as
> "from-idiots",
You can cryptographically hash tags so that third-parties can't read
the contents of the in-headers. For security, a salt should be appended
to the tag name to make dictionary attacks on the tags more difficult.
For their owners' convenience, mail clients will want a mapping of hash
to tag name.
> [...] pseudo-mails stored in Maildir and synchronised by IMAP
A single RFC2822 message can store the salt and hash-to-tag database. It
could contain a clear subject and directions to the end user not to move
or delete it. This would not, I think, terribly confuse existing mail
clients or their users.
-Dave
--
David A. Harding Website: http://dtrt.org/
1 (609) 997-0765 Email: dave@dtrt.org
Jabber/XMPP: dharding@jabber.org
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Idea for storing tags
2010-01-12 4:11 ` Idea for storing tags Scott Morrison
@ 2010-01-13 1:24 ` martin f krafft
2010-01-13 5:39 ` Scott Morrison
0 siblings, 1 reply; 42+ messages in thread
From: martin f krafft @ 2010-01-13 1:24 UTC (permalink / raw)
To: mailtags discussion list, notmuch discussion list
[-- Attachment #1: Type: text/plain, Size: 4873 bytes --]
also sprach Scott Morrison <smorr@indev.ca> [2010.01.12.1711 +1300]:
> 1. synchronization of tag data with emails -- if they are in
> a subfolder then it presents the issue of maintaining this
> subfolder when managing emails (moving, deleting, duplicating etc)
> and any .tag folder unaware clients are likely cause an breakage
> in tagdata/message association. One way of doing this is to have
> a global .tag folder.
A global .tag folder indexed by e.g. message ID, as you state later,
would probably allow for this. Or a file-per-tag design. We'd have
to think carefully about pros and cons for each.
When thinking about this, I always have to remind myself that we are
targetting this at a design that has indexed search. If that weren't
the case, searches would be incredibly expensive.
Maybe a better approach would be content addressing (see below).
> 2. what happens if that message is archived or moved to an
> exclusively local cache -- eg. Mail.app on OS X can easily move
> IMAP messages to a folder resident on the computers computers?
Well, if the target can store tags, then ideally the MUA should know
how to transfer them along.
Maybe the right thing to do would be to use extended attributes
(which are stored in the inode!), even if they may not be
universally supported yet. If our solution scales, then this might
lead to a significant increase in xattr adoption.
> 3. what happens with duplicates of emails -- I would assume that
> the message id would be the key to match the tag data to the
> message. In this system a duplicate of a message could not have
> a different set of tags from the original (not that this would
> necessarily be desirable.)
Duplicates need folders, and tags and folders are somewhat at odds
with each other. I mean, you can represent a folder hierarchy with
tags (and more), and if you have tags and folders, you are
potentially introducing a level of confusion/ambiguity that we don't
want in the first place. Maybe the ideal solution doesn't need
folders anymore (and IMAP-compatible (Maildir) subfolders have
always been a hack anyway).
There are also two types of duplicates: copies and links. The former
can diverge, the latter can't. I don't really see a reason for
either. It's not like you need to copy a mail before you edit it,
and I don't see a real reason for linking, assuming that the primary
means of browsing will be tag-searches anyway.
Duplicates always make me think of content addressing, like Git's
object cache. We could store the content hash of a message in its
filename, and also use the hash to index into the tag database.
I think that would be much cleaner than message IDs, and would make
handling true duplicates (links) much easier, while copies (diverged
ex-duplicates) would also be taken care of automatically.
> Your mention of potential leakage (aka inadvertent disclosure of
> tag data) is real -- but only if the client used to bounce/forward
> is not the one to tag the message (one would assume that if
> a client can tag, it can know to exclude the tags in a bounce.)
True, and it's probably the minority of people using multiple
clients. But those who do might also manipulate mail with sed and
use sendmail directly.
I don't think we can successfully enhance RFC 5351 to make MTAs
always ditch the Tags:-header.
> Mail.app -- which I am pluging into does not forward headers --
ew! ;) (I think one should be able to forward pristine mails)
> though it will include all headers in a bounce -- but chance are
> you aren't tagging messages you are bouncing.:)
That chance might well be very low. I bounce/forward-as-attachment
a lot of mail from the past to make it easier for others to
establish context.
> The performance issue is very real -- because it means that
> somehow messages have to rewritten to the IMAP server -- IMAP
> doesn't have a mechanism AFAIK for updates.
Not even UIDPLUS?
http://wiki.dovecot.org/FeatUIDPLUS
> Additionally, IMAP doesn't have a mechanism for simply replacing
> one message data with another -- a new message must be written and
> the old message must be deleted and the message IMAP UID will
> change, and the client will have to deal with this especially if
> it is cache the messages.
Yes, I am experiencing this pain regularly, since I currently use
a lot of message rewriting as part of my workflow — one of the
reasons why I'd like to find an alternative.
> Also GMAIL IMAP is an issue-
Yeah, I bet. Is there anyone who doesn't think that that's Google's
problem, not ours, though?
--
martin | http://madduck.net/ | http://two.sentenc.es/
"there's someone in my head but it's not me."
-- pink floyd, the dark side of the moon, 1972
spamtraps: madduck.bogus@madduck.net
[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Idea for storing tags
2010-01-13 1:24 ` martin f krafft
@ 2010-01-13 5:39 ` Scott Morrison
2010-01-13 5:52 ` martin f krafft
2010-01-14 1:37 ` Carl Worth
0 siblings, 2 replies; 42+ messages in thread
From: Scott Morrison @ 2010-01-13 5:39 UTC (permalink / raw)
To: mailtags discussion list; +Cc: notmuch discussion list
On 2010-01-12, at 8:24 PM, martin f krafft wrote:
> also sprach Scott Morrison <smorr@indev.ca> [2010.01.12.1711 +1300]:
>> 1. synchronization of tag data with emails -- if they are in
>> a subfolder then it presents the issue of maintaining this
>> subfolder when managing emails (moving, deleting, duplicating etc)
>> and any .tag folder unaware clients are likely cause an breakage
>> in tagdata/message association. One way of doing this is to have
>> a global .tag folder.
>
> A global .tag folder indexed by e.g. message ID, as you state later,
> would probably allow for this. Or a file-per-tag design. We'd have
> to think carefully about pros and cons for each.
>
> When thinking about this, I always have to remind myself that we are
> targetting this at a design that has indexed search. If that weren't
> the case, searches would be incredibly expensive.
>
> Maybe a better approach would be content addressing (see below).
Content hashing -- good Idea (& not something that has hit me before) -- better than Message-Id as I believe there are still some MUA /MTAs that allow messages without message ids. The only potential issue with this is that it is critical then to preserve the message source against encoding changes though that shouldn't be too hard to avoid.
>
>> 2. what happens if that message is archived or moved to an
>> exclusively local cache -- eg. Mail.app on OS X can easily move
>> IMAP messages to a folder resident on the computers computers?
>
> Well, if the target can store tags, then ideally the MUA should know
> how to transfer them along.
>
> Maybe the right thing to do would be to use extended attributes
> (which are stored in the inode!), even if they may not be
> universally supported yet. If our solution scales, then this might
> lead to a significant increase in xattr adoption.
The problem with anything that is not universally supported is that for a package that is to appeal to a wide userbase, most don't know and don't care about the particulars of this IMAP server vs that IMAP server. all they know it that for some reason it doesn't work with account X -- which leads to support head aches.
>
>> 3. what happens with duplicates of emails -- I would assume that
>> the message id would be the key to match the tag data to the
>> message. In this system a duplicate of a message could not have
>> a different set of tags from the original (not that this would
>> necessarily be desirable.)
>
> Duplicates need folders, and tags and folders are somewhat at odds
> with each other. I mean, you can represent a folder hierarchy with
> tags (and more), and if you have tags and folders, you are
> potentially introducing a level of confusion/ambiguity that we don't
> want in the first place. Maybe the ideal solution doesn't need
> folders anymore (and IMAP-compatible (Maildir) subfolders have
> always been a hack anyway).
>
> There are also two types of duplicates: copies and links. The former
> can diverge, the latter can't. I don't really see a reason for
> either. It's not like you need to copy a mail before you edit it,
> and I don't see a real reason for linking, assuming that the primary
> means of browsing will be tag-searches anyway.
>
> Duplicates always make me think of content addressing, like Git's
> object cache. We could store the content hash of a message in its
> filename, and also use the hash to index into the tag database.
> I think that would be much cleaner than message IDs, and would make
> handling true duplicates (links) much easier, while copies (diverged
> ex-duplicates) would also be taken care of automatically.
I agree that conceptually duplicates should be buried but end users do have "peculiar" organization systems.
>
> -snip-
>> The performance issue is very real -- because it means that
>> somehow messages have to rewritten to the IMAP server -- IMAP
>> doesn't have a mechanism AFAIK for updates.
>
> Not even UIDPLUS?
> http://wiki.dovecot.org/FeatUIDPLUS
From my reading, uidplus doesn't allow a delta modification of a message on a server -- just to write a portion of a message back -- you still have to write the whole thing back and that can mean real bandwidth issues for some messages.
>
>> Additionally, IMAP doesn't have a mechanism for simply replacing
>> one message data with another -- a new message must be written and
>> the old message must be deleted and the message IMAP UID will
>> change, and the client will have to deal with this especially if
>> it is cache the messages.
>
> Yes, I am experiencing this pain regularly, since I currently use
> a lot of message rewriting as part of my workflow — one of the
> reasons why I'd like to find an alternative.
>
>> Also GMAIL IMAP is an issue-
>
> Yeah, I bet. Is there anyone who doesn't think that that's Google's
> problem, not ours, though?
>
Call it Googles problem as you like -- but when I have a product that doesn't work with GMAIL IMAP there are a lot of potential users that don't care about server peculiarities and rather just have it work.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Idea for storing tags
2010-01-13 5:39 ` Scott Morrison
@ 2010-01-13 5:52 ` martin f krafft
2010-01-14 1:37 ` Carl Worth
1 sibling, 0 replies; 42+ messages in thread
From: martin f krafft @ 2010-01-13 5:52 UTC (permalink / raw)
To: mailtags discussion list, notmuch discussion list
[-- Attachment #1: Type: text/plain, Size: 2605 bytes --]
also sprach Scott Morrison <smorr@indev.ca> [2010.01.13.1752 +1300]:
> The problem with anything that is not universally supported is
> that for a package that is to appeal to a wide userbase, most
> don't know and don't care about the particulars of this IMAP
> server vs that IMAP server. all they know it that for some reason
> it doesn't work with account X -- which leads to support head
> aches.
[...]
> Call it Googles problem as you like -- but when I have a product
> that doesn't work with GMAIL IMAP there are a lot of potential
> users that don't care about server peculiarities and rather just
> have it work.
Well, the way I see it: you cannot change all IMAP servers at once,
and you certainly cannot change Google. If it's possible to
implement tagging for email (dare say semantic e-mail) with standard
means (where standard means sub-standard, as exemplified by your
previous GMail IMAP example), then that's the best way, but if that
can't happen then we ought to try a better way. Should we find
a solution then, by the rate of standardisation on the 'Net, maybe
my grandchildren will finally be able to do proper e-mail. ;)
> I agree that conceptually duplicates should be buried but end
> users do have "peculiar" organization systems.
I think tags should help abstract e-mail away from underlying
storage and I'd love that to be a goal.
> From my reading, uidplus doesn't allow a delta modification of
> a message on a server -- just to write a portion of a message back
> -- you still have to write the whole thing back and that can mean
> real bandwidth issues for some messages.
Absolutely. It would indeed be better if you could just send
changes.
I just sent a blank mail to
imap-protocol-subscribe@mailman.u.washington.edu
and have started browsing the archives. So far, there's not really
anything relevant.
Anyway, looking back at the RFC on keywords, it's not exactly
encouraging:
A keyword is defined by the server implementation. Keywords do not
begin with "\". Servers MAY permit the client to define new
keywords in the mailbox (see the description of the PERMANENTFLAGS
response code for more information).
Anyway, I'll try to untangle the various issues re:IMAP we've been
seeing, write mails for each, and hopefully get to the point where
I can enquire about IMAPv5. ;)
--
martin | http://madduck.net/ | http://two.sentenc.es/
the unix philosophy basically involves
giving you enough rope to hang yourself.
and then some more, just to be sure.
spamtraps: madduck.bogus@madduck.net
[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Idea for storing tags
2010-01-11 22:19 Idea for storing tags martin f krafft
` (2 preceding siblings ...)
2010-01-12 21:39 ` David A. Harding
@ 2010-01-14 1:32 ` Carl Worth
2010-01-14 8:04 ` martin f krafft
3 siblings, 1 reply; 42+ messages in thread
From: Carl Worth @ 2010-01-14 1:32 UTC (permalink / raw)
To: martin f krafft, mailtags discussion list; +Cc: notmuch discussion list
[-- Attachment #1: Type: text/plain, Size: 2334 bytes --]
On Tue, 12 Jan 2010 11:19:09 +1300, martin f krafft <madduck@madduck.net> wrote:
> 1. External database, which has the downside of not being
> synchronisable with standard IMAP, like the rest of your mail
> (assuming you use IMAP). Also, it's possible for mailstore and
> database to get out of sync.
Yes. This approach requires some external means of synchronizing the
tags from one system to another.
I don't understand what it would mean to have the mailstore and the
database out of synch here. This approach doesn't have the tags in the
mailstore by definition, right?
> How about using pseudo-mails stored in Maildir and synchronised by
> IMAP? E.g. every folder could have a subfolder .TAGS and if we find
> a way to smartly pair messages between parent and subfolder, we'd
> have a tag store alongside the mailstore it refers to, but without
> the danger of leakage, and without having to rewrite messages.
...
> Anyway, the idea is out now. Thoughts?
There are a couple of problems that I don't see addressed at all with
this approach. The first is that there's not a one-to-one mapping
between messages and files in the mail store. (I'm CCed on a lot of list
mail meaning that I have multiple files in my mail store for a single
message.)
Second, the only reason I would be interested in synchronizing mail
between two systems is so that I could manipulate the tag data in
multiple places, (that is, remove the "unread" tag whether on my
network-disconnected laptop or via web-mail when away from my
laptop). Using imap for synchronizing a file of tags within the mail
store gives you no mechanism for doing any sort of conflict resolution,
right? (Which I think in almost all cases is going to be quite trivial
if there's a chance for a program to resolve it.)
So it sounds to me like we're going to need *something* custom for doing
the synchronization, (to handle modifications on both ends). At which
point there's only disadvantages to keeping the data inside the
mailstore, and there's also no disadvantage left to keeping the data
inside a database. [*]
[*] Though, I think a plain-text file with tags managed with something
like git (and perhaps a custom merger) could save a lot of work. Or
perhaps a plain-text journal of tag manipulations on either end that
could be replayed on the other.
-Carl
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Idea for storing tags
2010-01-13 5:39 ` Scott Morrison
2010-01-13 5:52 ` martin f krafft
@ 2010-01-14 1:37 ` Carl Worth
1 sibling, 0 replies; 42+ messages in thread
From: Carl Worth @ 2010-01-14 1:37 UTC (permalink / raw)
To: Scott Morrison, mailtags discussion list; +Cc: notmuch discussion list
[-- Attachment #1: Type: text/plain, Size: 1004 bytes --]
On Wed, 13 Jan 2010 00:39:14 -0500, Scott Morrison <smorr@indev.ca> wrote:
> > Maybe a better approach would be content addressing (see below).
>
> Content hashing -- good Idea (& not something that has hit me before)
> -- better than Message-Id as I believe there are still some MUA /MTAs
> that allow messages without message ids. The only potential issue
> with this is that it is critical then to preserve the message source
> against encoding changes though that shouldn't be too hard to avoid.
Another problem with content-based naming for messages is that most of
the messages in my mail store that I consider duplicates don't actually
have identical content. (One is sent directly to me via CC and the other
is sent by the mailing-list software *after* appending a footer to the
message.)
That said, notmuch already does use a sha-1 sum as the message
identifier for any message that does not have a valid Message-ID
header. So there's definitely a place for this.
-Carl
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Idea for storing tags
2010-01-14 1:32 ` Carl Worth
@ 2010-01-14 8:04 ` martin f krafft
2010-01-14 22:24 ` Carl Worth
0 siblings, 1 reply; 42+ messages in thread
From: martin f krafft @ 2010-01-14 8:04 UTC (permalink / raw)
To: Carl Worth; +Cc: notmuch discussion list, mailtags discussion list
[-- Attachment #1: Type: text/plain, Size: 2974 bytes --]
also sprach Carl Worth <cworth@cworth.org> [2010.01.14.1432 +1300]:
> Yes. This approach requires some external means of synchronizing the
> tags from one system to another.
>
> I don't understand what it would mean to have the mailstore and the
> database out of synch here. This approach doesn't have the tags in the
> mailstore by definition, right?
You might have marked a message 'read' on one machine and if the two
get out of sync on another machine, you might have the same message
unread there.
> > How about using pseudo-mails stored in Maildir and synchronised by
> > IMAP? E.g. every folder could have a subfolder .TAGS and if we find
> > a way to smartly pair messages between parent and subfolder, we'd
> > have a tag store alongside the mailstore it refers to, but without
> > the danger of leakage, and without having to rewrite messages.
> ...
> > Anyway, the idea is out now. Thoughts?
>
> There are a couple of problems that I don't see addressed at all with
> this approach. The first is that there's not a one-to-one mapping
> between messages and files in the mail store. (I'm CCed on a lot of list
> mail meaning that I have multiple files in my mail store for a single
> message.)
Shouldn't this just be solved? I've had formail+procmail delete my
duplicates for 10+ years, and while I don't like the fact that
I usually get the CC before the list mail, and thus cannot filter on
Delivered-To, I have never looked back.
> Second, the only reason I would be interested in synchronizing mail
> between two systems is so that I could manipulate the tag data in
> multiple places, (that is, remove the "unread" tag whether on my
> network-disconnected laptop or via web-mail when away from my
> laptop). Using imap for synchronizing a file of tags within the mail
> store gives you no mechanism for doing any sort of conflict resolution,
> right? (Which I think in almost all cases is going to be quite trivial
> if there's a chance for a program to resolve it.)
I have not thought about this, but you are right. IMAP does not
really allow for conflict resolution, which may well be *the* reason
why you cannot update existing messages.
> [*] Though, I think a plain-text file with tags managed with
> something like git (and perhaps a custom merger) could save a lot
> of work. Or perhaps a plain-text journal of tag manipulations on
> either end that could be replayed on the other.
Git is good at conflict resolution if run interactively, but [0]
still makes me question whether it can ever take the place of IMAP.
However, Asheesh Laroia, who has floated the idea of Git-for-mail at
DebConf8 already, has some ideas and hopefully will soon reply to my
mail [0], which I just bounced.
0. http://notmuchmail.org/pipermail/notmuch/2010/001114.html
--
martin | http://madduck.net/ | http://two.sentenc.es/
apt-get source --compile gentoo
spamtraps: madduck.bogus@madduck.net
[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Potential problem using Git for mail (was: Idea for storing tags)
2010-01-12 4:51 ` Potential problem using Git for mail (was: Idea for storing tags) martin f krafft
2010-01-12 19:38 ` Jameson Rollins
@ 2010-01-14 8:12 ` Asheesh Laroia
2010-01-14 20:37 ` martin f krafft
1 sibling, 1 reply; 42+ messages in thread
From: Asheesh Laroia @ 2010-01-14 8:12 UTC (permalink / raw)
To: martin f krafft; +Cc: notmuch
[-- Attachment #1: Type: TEXT/PLAIN, Size: 2836 bytes --]
On Tue, 12 Jan 2010, martin f krafft wrote:
> If the MDA delivers to Git, then potentially, you might get into a
> situation where you cannot write your own changes back to the repo. This
> is also a DoS scenario: I'll just keep sending you e-mail, and if I
> manage to pass your mail filters, I'll basically commit to your mail
> repository at regular intervals. Say those are 5 seconds. In order for
> you to write updates to the repo, e.g. to update tags, then you would
> need to pull, rebase, and push all within 5 seconds, for otherwise you'd
> try to push non-fast-forwards.
Sure. But the MDA doesn't need to do the commit immediately. Since
(presumably) we're using Maildir, the MDA on the mail receiving server is
going to generate filenames that won't cause conflicts. So it's okay to
leave the files uncommitted.
If that's too scary, then have the MDA deliver to its own git branch with
its own checkout. Then, if you can force linearity with a lock (!), your
client can have a special "lock the repo and push" command. Your remote
MUA could even ask the MDA to lock the Maildir while it does a merge and
then pushes that, and then the MDA can go back to dequeuing messages from
the MTA into the Maildir.
Not the beautiful lockless world the purists want, but I'm okay with that.
> This a bit unrealistic, surely, but there's a real annoyance in it:
> you'd have to pull/rebase/push until a push succeeds — until you found a
> time window between pull and push during which the MDA didn't write to
> the repo. This might take a long time. If this happens in the background
> by Cron, it's not a real concern, but if this becomes a UI issue, I
> wouldn't know how to handle it.
It's not entirely unreasonable. Cron caused issues like that for me when I
tracked my Maildir in git.
I'm just learning about notmuchmail.org, but I'll keep listening here.
Preferably CC: me on replies to this mail.
I will say, I'm interested in an email setup with with working IMAP on at
least one side.
There's one other bad race I ran into when using git to manage my
Maildirs. I was using Dovecot to serve my Maildir to an IMAP client,
alpine. I separately did a "git merge" from origin/master, where the
remote MTA had an MDA deliving messages and a layer on top of that
committed them.
When I did the "git merge", git would create the Maildir files in
~/Maildir/cur/... non-atomically. Dovecot would notice the file in
~/Maildir/cur/ and think, "This file must be ready!" So it would parse it
even though git hadn't finished writing it. This caused me to only see
partial headers in Alpine since Dovecot parsed it before it was a complete
message.
That kind of sucked.
-- Asheesh.
--
Almost anything derogatory you could say about today's software design
would be accurate.
-- K. E. Iverson
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Potential problem using Git for mail (was: Idea for storing tags)
2010-01-14 8:12 ` Asheesh Laroia
@ 2010-01-14 20:37 ` martin f krafft
2010-01-21 6:28 ` Asheesh Laroia
0 siblings, 1 reply; 42+ messages in thread
From: martin f krafft @ 2010-01-14 20:37 UTC (permalink / raw)
To: Asheesh Laroia; +Cc: notmuch
[-- Attachment #1: Type: text/plain, Size: 1500 bytes --]
also sprach Asheesh Laroia <asheesh@asheesh.org> [2010.01.14.2112 +1300]:
> Sure. But the MDA doesn't need to do the commit immediately. Since
> (presumably) we're using Maildir, the MDA on the mail receiving
> server is going to generate filenames that won't cause conflicts.
> So it's okay to leave the files uncommitted.
So when does the commit happen?
> When I did the "git merge", git would create the Maildir files in
> ~/Maildir/cur/... non-atomically.
This might be something that the Git people could address if it was
brought up on the mailing list. Then again, it might not be possible
without going via a temporary file, which I doubt will fly.
I suppose that I never actually considered merges on the IMAP server
side, but obviously the IMAP server has to work off a clone, and
that means it needs to merge.
> Dovecot would notice the file in ~/Maildir/cur/ and think, "This
> file must be ready!" So it would parse it even though git hadn't
> finished writing it. This caused me to only see partial headers in
> Alpine since Dovecot parsed it before it was a complete message.
I wonder if a custom merge driver could address this to properly use
…/tmp/ to assemble the message and only then move it.
--
martin | http://madduck.net/ | http://two.sentenc.es/
"this week dragged past me so slowly;
the days fell on their knees..."
-- david bowie
spamtraps: madduck.bogus@madduck.net
[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Idea for storing tags
2010-01-14 8:04 ` martin f krafft
@ 2010-01-14 22:24 ` Carl Worth
2010-01-14 22:32 ` martin f krafft
0 siblings, 1 reply; 42+ messages in thread
From: Carl Worth @ 2010-01-14 22:24 UTC (permalink / raw)
To: martin f krafft; +Cc: notmuch discussion list, mailtags discussion list
[-- Attachment #1: Type: text/plain, Size: 2250 bytes --]
On Thu, 14 Jan 2010 21:04:21 +1300, martin f krafft <madduck@madduck.net> wrote:
> You might have marked a message 'read' on one machine and if the two
> get out of sync on another machine, you might have the same message
> unread there.
That's a different issue though. With two databases there's clearly the
opportunity for the two databases to be out of synch.
But you talked about the database being out of synch with respect to the
mailstore. And that's something I just don't understand, (given the
assumption that all tags are stored in the database---which was the
explicit description of the case of interest).
> Shouldn't this just be solved? I've had formail+procmail delete my
> duplicates for 10+ years, and while I don't like the fact that
> I usually get the CC before the list mail, and thus cannot filter on
> Delivered-To, I have never looked back.
Notmuch has access to all the information it needs to allow you to
delete the CC version once the list mail arrives. So you could do
notmuch-based deletion now and avoid losing the Delivered-To header if
you want.
> > [*] Though, I think a plain-text file with tags managed with
> > something like git (and perhaps a custom merger) could save a lot
> > of work. Or perhaps a plain-text journal of tag manipulations on
> > either end that could be replayed on the other.
>
> Git is good at conflict resolution if run interactively, but [0]
> still makes me question whether it can ever take the place of IMAP.
> However, Asheesh Laroia, who has floated the idea of Git-for-mail at
> DebConf8 already, has some ideas and hopefully will soon reply to my
> mail [0], which I just bounced.
>
> 0. http://notmuchmail.org/pipermail/notmuch/2010/001114.html
Using git for mail is an interesting idea, but not what I was actually
proposing here.
I think that synchronizing the mail store and synchronizing the tags
information are tasks that have different requirements, and for which we
may well want different tools.
So I was talking about using imap (or rsync, or what have you) for
copying the mailtstore, and then having something with a bit more
domain-specific awareness for doing the synchronization of the tags
data.
-Carl
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Idea for storing tags
2010-01-14 22:24 ` Carl Worth
@ 2010-01-14 22:32 ` martin f krafft
0 siblings, 0 replies; 42+ messages in thread
From: martin f krafft @ 2010-01-14 22:32 UTC (permalink / raw)
To: mailtags discussion list, notmuch discussion list
[-- Attachment #1: Type: text/plain, Size: 2593 bytes --]
also sprach Carl Worth <cworth@cworth.org> [2010.01.15.1124 +1300]:
> > You might have marked a message 'read' on one machine and if the two
> > get out of sync on another machine, you might have the same message
> > unread there.
>
> That's a different issue though. With two databases there's clearly the
> opportunity for the two databases to be out of synch.
>
> But you talked about the database being out of synch with respect to the
> mailstore. And that's something I just don't understand, (given the
> assumption that all tags are stored in the database---which was the
> explicit description of the case of interest).
Yes, we are talking about the situation where the tagstore is
seperate from the mailstore, and that they are both synchronised
with a server, or between machines, separately. If for some reason
you only synchronise the mailstore — say because the connection
drops before the sync of the tagstore completes — then you end up
with an out-of-sync situation, because the mailstore-sync will have
pulled in a new message, but not the associated tags. So if you had
already read this message on another machine and tagged it 'done',
then it would show up on this machine as 'new' without the 'done'
tag, because the tags were not synchronised.
The only way to really solve this is by transferring a message and
its tags in a transactional way.
> > Shouldn't this just be solved? I've had formail+procmail delete my
> > duplicates for 10+ years, and while I don't like the fact that
> > I usually get the CC before the list mail, and thus cannot filter on
> > Delivered-To, I have never looked back.
>
> Notmuch has access to all the information it needs to allow you to
> delete the CC version once the list mail arrives. So you could do
> notmuch-based deletion now and avoid losing the Delivered-To header if
> you want.
Of course. I hadn't thought that far.
However, there are still benefits to formail, namely avoiding having
to run duplicates through potentially expensive spamfilters.
> I think that synchronizing the mail store and synchronizing the
> tags information are tasks that have different requirements, and
> for which we may well want different tools.
Fair enough. Maybe I am just paranoid about the stores getting out
of sync (see above).
--
martin | http://madduck.net/ | http://two.sentenc.es/
"we all know linux is great...
it does infinite loops in 5 seconds."
-- linus torvalds
spamtraps: madduck.bogus@madduck.net
[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Potential problem using Git for mail (was: Idea for storing tags)
2010-01-14 20:37 ` martin f krafft
@ 2010-01-21 6:28 ` Asheesh Laroia
2010-01-25 0:46 ` Git as notmuch object store (was: Potential problem using Git for mail) martin f krafft
0 siblings, 1 reply; 42+ messages in thread
From: Asheesh Laroia @ 2010-01-21 6:28 UTC (permalink / raw)
To: martin f krafft; +Cc: notmuch
[-- Attachment #1: Type: TEXT/PLAIN, Size: 2038 bytes --]
On Fri, 15 Jan 2010, martin f krafft wrote:
> also sprach Asheesh Laroia <asheesh@asheesh.org> [2010.01.14.2112 +1300]:
>> Sure. But the MDA doesn't need to do the commit immediately. Since
>> (presumably) we're using Maildir, the MDA on the mail receiving
>> server is going to generate filenames that won't cause conflicts.
>> So it's okay to leave the files uncommitted.
>
> So when does the commit happen?
>
>> When I did the "git merge", git would create the Maildir files in
>> ~/Maildir/cur/... non-atomically.
>
> This might be something that the Git people could address if it was
> brought up on the mailing list. Then again, it might not be possible
> without going via a temporary file, which I doubt will fly.
A temporary file + rename() is the only way, as far as I know.
> I suppose that I never actually considered merges on the IMAP server
> side, but obviously the IMAP server has to work off a clone, and that
> means it needs to merge.
It's not "merge" that's unsafe; that just builds a tree in the git index
(assuming no conflicts). It's the ensuing process of git writing a tree to
the filesystem that is problematic.
I could probably actually write a wrapper that locks the Maildir while git
is operating. It would probably be specific to each IMAP server.
Note that this mean git is fundamentally incompatible with Maildir, not
just IMAP servers.
>> Dovecot would notice the file in ~/Maildir/cur/ and think, "This file
>> must be ready!" So it would parse it even though git hadn't finished
>> writing it. This caused me to only see partial headers in Alpine since
>> Dovecot parsed it before it was a complete message.
>
> I wonder if a custom merge driver could address this to properly use
> …/tmp/ to assemble the message and only then move it.
I don't think a merge driver can do it for the reason stated above.
-- Asheesh.
--
I always turn to the sports pages first, which record people's accomplishments.
The front page has nothing but man's failures.
-- Chief Justice Earl Warren
^ permalink raw reply [flat|nested] 42+ messages in thread
* Git as notmuch object store (was: Potential problem using Git for mail)
2010-01-21 6:28 ` Asheesh Laroia
@ 2010-01-25 0:46 ` martin f krafft
2010-01-25 5:19 ` Asheesh Laroia
` (2 more replies)
0 siblings, 3 replies; 42+ messages in thread
From: martin f krafft @ 2010-01-25 0:46 UTC (permalink / raw)
To: Asheesh Laroia; +Cc: notmuch
[-- Attachment #1: Type: text/plain, Size: 3212 bytes --]
also sprach Asheesh Laroia <asheesh@asheesh.org> [2010.01.21.1928 +1300]:
> >I suppose that I never actually considered merges on the IMAP
> >server side, but obviously the IMAP server has to work off a clone,
> >and that means it needs to merge.
>
> It's not "merge" that's unsafe; that just builds a tree in the git
> index (assuming no conflicts). It's the ensuing process of git
> writing a tree to the filesystem that is problematic.
There is no way to make that atomic, I am afraid. As you say.
> I could probably actually write a wrapper that locks the Maildir
> while git is operating. It would probably be specific to each IMAP
> server.
Ouch! I'd really rather not go there.
> Note that this mean git is fundamentally incompatible with
> Maildir, not just IMAP servers.
We had an idea about using Git to replace IMAP altogether, along
with making notmuch use a bare Git repository as object store. The
idea is that notmuch uses low-level Git commands to access the .git
repository (from which you can still checkout a tree tying the blobs
into a Maildir). The benefit would be compression, lower inode count
(due to packs), and backups using clones/merges.
You could either have the MDA write to a Git repo on the server side
and use git packs to download mail to a local clone, or one could
have e.g. offlineimap grow a Git storage backend. The interface to
notmuch would be the same.
If we used this, all the rename and delete code would be refactored
into Git and could be removed from notmuch. In addition, notmuch
could actually use Git tree objects to represent the results of
searches, and you could checkout these trees. However, deleting
messages from search results would not have any effect on the
message or its existence in other search results, much like what
happens with mairix nowadays.
I think we all kinda agreed that the Maildir flags should not be
used by notmuch and that things like Sebastian's notmuchsync should
be used if people wanted flags represented in Maildir filenames.
Instead of a Maildir checkout, notmuch could provide an interface to
browse the store contents in a way that could make it accessible to
mutt. The argument is that with 'notmuch {ls,cat,rm,…}', a mutt
backend could be trivially written. I am not sure about that, but
it's worth a try.
But there are still good reasons why you'd want to have IMAP
capability too, e.g. Webmail. Given the atomicity problems that come
from Git, maybe an IMAP server reading from the Git store would make
sense.
However, this all sounds like a lot of NIH and reinvention. It's
a bit like the marriage between the hypothetical Maildir2 and Git,
which is definitely worth pursuing. Before we embark on any of this,
however, we'd need to define the way in which Git stores mail.
Stewart, you've worked most on this so far. Would you like to share
your thoughts?
--
martin | http://madduck.net/ | http://two.sentenc.es/
"reife des mannes, das ist es,
den ernst wiedergefunden zu haben, den
man hatte als kind beim spiel."
-- friedrich nietzsche
spamtraps: madduck.bogus@madduck.net
[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Git as notmuch object store (was: Potential problem using Git for mail)
2010-01-25 0:46 ` Git as notmuch object store (was: Potential problem using Git for mail) martin f krafft
@ 2010-01-25 5:19 ` Asheesh Laroia
2010-01-25 7:43 ` martin f krafft
2010-01-25 13:49 ` Sebastian Spaeth
2010-02-15 0:51 ` Stewart Smith
2 siblings, 1 reply; 42+ messages in thread
From: Asheesh Laroia @ 2010-01-25 5:19 UTC (permalink / raw)
To: martin f krafft; +Cc: notmuch
[-- Attachment #1: Type: TEXT/PLAIN, Size: 5131 bytes --]
On Mon, 25 Jan 2010, martin f krafft wrote:
> also sprach Asheesh Laroia <asheesh@asheesh.org> [2010.01.21.1928
> +1300]:
>>> I suppose that I never actually considered merges on the IMAP server
>>> side, but obviously the IMAP server has to work off a clone, and that
>>> means it needs to merge.
>>
>> It's not "merge" that's unsafe; that just builds a tree in the git
>> index (assuming no conflicts). It's the ensuing process of git writing
>> a tree to the filesystem that is problematic.
>
> There is no way to make that atomic, I am afraid. As you say.
>
>> I could probably actually write a wrapper that locks the Maildir while
>> git is operating. It would probably be specific to each IMAP server.
>
> Ouch! I'd really rather not go there.
You say "Ouch" but you should know Dovecot *already* does this. I don't
mind interoperating with that.
See http://wiki.dovecot.org/MailboxFormat/Maildir, section "Issues with
the specification", subsection "Locking". I term this the famous readdir()
race. Without this lock, Maildir is fundamentally incompatible with IMAP
-- one Maildir-using process modifying message flags could make a
different Maildir-using process think said message is actually deleted. In
the case of temporary disappearing mails in Mutt locally, that's not the
end of the world. For IMAP, it will make the IMAP daemon (one of the
Maildir-using processes) send a note to IMAP clients saying that the
message has been deleted and expunged.
>> Note that this mean git is fundamentally incompatible with Maildir, not
>> just IMAP servers.
>
> We had an idea about using Git to replace IMAP altogether, along with
> making notmuch use a bare Git repository as object store. The idea is
> that notmuch uses low-level Git commands to access the .git repository
> (from which you can still checkout a tree tying the blobs into a
> Maildir). The benefit would be compression, lower inode count (due to
> packs), and backups using clones/merges.
Sure, that makes sense to me.
> You could either have the MDA write to a Git repo on the server side and
> use git packs to download mail to a local clone, or one could have e.g.
> offlineimap grow a Git storage backend. The interface to notmuch would
> be the same.
Yeah, I generally like this.
> If we used this, all the rename and delete code would be refactored into
> Git and could be removed from notmuch. In addition, notmuch could
> actually use Git tree objects to represent the results of searches, and
> you could checkout these trees. However, deleting messages from search
> results would not have any effect on the message or its existence in
> other search results, much like what happens with mairix nowadays.
That's okay with me.
> I think we all kinda agreed that the Maildir flags should not be used by
> notmuch and that things like Sebastian's notmuchsync should be used if
> people wanted flags represented in Maildir filenames.
Aww, I like Maildir flags, but if there's a sync tool, I'm fine with that.
> Instead of a Maildir checkout, notmuch could provide an interface to
> browse the store contents in a way that could make it accessible to
> mutt. The argument is that with 'notmuch {ls,cat,rm,…}', a mutt backend
> could be trivially written. I am not sure about that, but it's worth a
> try.
Sure.
> But there are still good reasons why you'd want to have IMAP capability
> too, e.g. Webmail. Given the atomicity problems that come from Git,
> maybe an IMAP server reading from the Git store would make sense.
It wouldn't be too hard to write a FUSE filesystem that presented an
interface to a Git repository that didn't allow the contents of files to
be modified. Then Dovecot could think it's interacting with the
filesystem.
> However, this all sounds like a lot of NIH and reinvention. It's
> a bit like the marriage between the hypothetical Maildir2 and Git,
> which is definitely worth pursuing. Before we embark on any of this,
> however, we'd need to define the way in which Git stores mail.
Sure. If it were me, I'd just say, "For phase 1 of notmuch, just have git
store Maildir spools." When you need a filesystem interface for e.g.
Dovecot, have a FUSE wrapper.
See how far that can take you, and then see if version 2 is necessary.
(-:
> Stewart, you've worked most on this so far. Would you like to share your
> thoughts?
I'll listen, too.
Just don't fall into the trap of thinking Maildir is compatible with IMAP.
It's not, because as I understand things, the filesystem doesn't guarantee
that you can actually iterate across a directory's files if another
process is modifying the list of files.
I'm not sure, but maybe it's safe if you refuse to ever modify a
message's flags in the filename.
Anyway, as I see it, further hacks that aren't much worse than Dovecot's
should be considered okay, unless you have a more elegant design up your
sleeve.
If I'm slightly wrong about something, try to give me the benefit of
doubt. It's past midnight. (-:
-- Asheesh.
--
There's no real need to do housework -- after four years it doesn't get
any worse.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Git as notmuch object store (was: Potential problem using Git for mail)
2010-01-25 5:19 ` Asheesh Laroia
@ 2010-01-25 7:43 ` martin f krafft
0 siblings, 0 replies; 42+ messages in thread
From: martin f krafft @ 2010-01-25 7:43 UTC (permalink / raw)
To: notmuch
[-- Attachment #1: Type: text/plain, Size: 3224 bytes --]
also sprach Asheesh Laroia <asheesh@asheesh.org> [2010.01.25.1819 +1300]:
> You say "Ouch" but you should know Dovecot *already* does this. I
> don't mind interoperating with that.
>
> See http://wiki.dovecot.org/MailboxFormat/Maildir, section "Issues
> with the specification", subsection "Locking". I term this theQ
> famous readdir() race.
Yikes. IMAP (including dovecot) just SUCKS.
> Without this lock, Maildir is fundamentally incompatible with IMAP
> -- one Maildir-using process modifying message flags could make
> a different Maildir-using process think said message is actually
> deleted. In the case of temporary disappearing mails in Mutt
> locally, that's not the end of the world. For IMAP, it will make
> the IMAP daemon (one of the Maildir-using processes) send a note
> to IMAP clients saying that the message has been deleted and
> expunged.
[…]
> Just don't fall into the trap of thinking Maildir is compatible
> with IMAP. It's not, because as I understand things, the
> filesystem doesn't guarantee that you can actually iterate across
> a directory's files if another process is modifying the list of
> files.
This is all perfect reason to concentrate even more on designing
a store that could potentially make IMAP obsolete once and for all!
The current idea is to sync Git downstream only, and find a way to
keep multiple copies of a tagstore in sync, by way of the "server
instance" (where mail is received/delivered). Deleting messages
would then be something like setting the notmuch::deleted tag, which
clients would honour; on the server, a cleanup process would run
regularly to actually delete the blobs associated with deleted
messages. This would then propogate the next time one pulls from
Git.
Whether to store history (commit objects) or just collections (tree
objects) needs to be investigated.
> >But there are still good reasons why you'd want to have IMAP
> >capability too, e.g. Webmail. Given the atomicity problems that
> >come from Git, maybe an IMAP server reading from the Git store
> >would make sense.
>
> It wouldn't be too hard to write a FUSE filesystem that presented
> an interface to a Git repository that didn't allow the contents of
> files to be modified. Then Dovecot could think it's interacting
> with the filesystem.
Yes, a FUSE layer (which adds a daemon), or a lightweight access
API via libnotmuch. Probably the former using the latter. ;)
> Aww, I like Maildir flags, but if there's a sync tool, I'm fine
> with that.
[…]
> I'm not sure, but maybe it's safe if you refuse to ever modify
> a message's flags in the filename.
The main point is that there is nothing really in Maildir filenames
that you couldn't equally (and possibly better) represent in the
notmuch::* tag namespace, and then there is benefit in only having
one used primarily (which means notmuchsync can do whatever it
wants without affecting or messing with notmuch).
--
martin | http://madduck.net/ | http://two.sentenc.es/
"if I can't dance, i don't want to be part of your revolution."
- emma goldman
spamtraps: madduck.bogus@madduck.net
[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Git as notmuch object store (was: Potential problem using Git for mail)
2010-01-25 0:46 ` Git as notmuch object store (was: Potential problem using Git for mail) martin f krafft
2010-01-25 5:19 ` Asheesh Laroia
@ 2010-01-25 13:49 ` Sebastian Spaeth
2010-01-25 16:22 ` Mike Kelly
` (2 more replies)
2010-02-15 0:51 ` Stewart Smith
2 siblings, 3 replies; 42+ messages in thread
From: Sebastian Spaeth @ 2010-01-25 13:49 UTC (permalink / raw)
To: martin f krafft, Asheesh Laroia; +Cc: notmuch
On Mon, 25 Jan 2010 13:46:59 +1300, martin f krafft <madduck@madduck.net> wrote:
> I think we all kinda agreed that the Maildir flags should not be
> used by notmuch and that things like Sebastian's notmuchsync should
> be used if people wanted flags represented in Maildir filenames.
While notmuchsync fullfils my needs, it is a kludge. It needs to call
"notmuch" for each mail where a MailDir flag has changed (which can be
quite often on an initial run, where most mails are likely to be read),
this can take a long, long time. It would makes sense IMHO to at least
pick pioto's "don't set unread if 'S' flag is set" on notmuch new[1].
Or - at the very least - not to set the "unread" flag by default.
Sebastian
[1] pioto's noarg-count branch (http://git.pioto.org/gitweb/notmuch.git
Announced in mail id:20100121204201.1C82764A0E@aether.pioto.org)
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Git as notmuch object store (was: Potential problem using Git for mail)
2010-01-25 13:49 ` Sebastian Spaeth
@ 2010-01-25 16:22 ` Mike Kelly
2010-01-25 21:46 ` tag dir proposal [was: Re: Git as notmuch object store] Jameson Rollins
2010-01-25 19:49 ` Git as notmuch object store (was: Potential problem using Git for mail) martin f krafft
2010-01-27 9:00 ` Sebastian Spaeth
2 siblings, 1 reply; 42+ messages in thread
From: Mike Kelly @ 2010-01-25 16:22 UTC (permalink / raw)
To: Sebastian Spaeth, martin f krafft, Asheesh Laroia; +Cc: notmuch
On Mon, 25 Jan 2010 14:49:00 +0100, "Sebastian Spaeth" <Sebastian@SSpaeth.de> wrote:
>
> On Mon, 25 Jan 2010 13:46:59 +1300, martin f krafft <madduck@madduck.net> wrote:
> > I think we all kinda agreed that the Maildir flags should not be
> > used by notmuch and that things like Sebastian's notmuchsync should
> > be used if people wanted flags represented in Maildir filenames.
>
> While notmuchsync fullfils my needs, it is a kludge. It needs to call
> "notmuch" for each mail where a MailDir flag has changed (which can be
> quite often on an initial run, where most mails are likely to be read),
> this can take a long, long time. It would makes sense IMHO to at least
> pick pioto's "don't set unread if 'S' flag is set" on notmuch new[1].
notmuchsync, as currently implemented, suffers from major performance
issues, in my opinion. It's a useful short term workaround, but not a
good long term solution.
But, I personally will always be using both notmuch and some other IMAP
client (my phone). I want the two to remain in sync easily enough.
notmuch is already much more robust with respect to that than sup, I
think (in terms of handling renames without barfing, etc).
At the very least, I want `notmuch new` to be able to:
If it sees a rename that involves changing maildir flags, alter the
related tags as necessary.
Similarly, provide a mechanism for correlating the folder name with
some set of tags, and change those tags as messages are moved around.
For example, I might have:
~/.notmuch-config:
[database]
path=/home/pioto/mail
...
[tags]
pioto@pioto.org/INBOX.ListMail.notmuch = notmuch
So, a 'tags' section, where each key is the folder name, relative to the
db path, and the value is one or more tag names
This means that I could relabel a message in gmail, for example, and
have the changes apply to notmuch at my next offlineimap run. And, it
means that my existing procmail rules will still be useful both to
notmuch, and to my phone, for the purpose of categorizing things.
I agree that all this should be optional. But, since it is likely the
behavior most people would expect, I think it should be the default.
PS. You mean the 'new-unread' branch, not the 'noarg-count' branch, from
my repo.
--
Mike Kelly
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Git as notmuch object store (was: Potential problem using Git for mail)
2010-01-25 13:49 ` Sebastian Spaeth
2010-01-25 16:22 ` Mike Kelly
@ 2010-01-25 19:49 ` martin f krafft
2010-01-27 9:00 ` Sebastian Spaeth
2 siblings, 0 replies; 42+ messages in thread
From: martin f krafft @ 2010-01-25 19:49 UTC (permalink / raw)
To: Sebastian Spaeth; +Cc: notmuch, Asheesh Laroia
[-- Attachment #1: Type: text/plain, Size: 827 bytes --]
also sprach Sebastian Spaeth <Sebastian@SSpaeth.de> [2010.01.26.0249 +1300]:
> While notmuchsync fullfils my needs, it is a kludge. It needs to
> call "notmuch" for each mail where a MailDir flag has changed
> (which can be quite often on an initial run, where most mails are
> likely to be read), this can take a long, long time. It would
> makes sense IMHO to at least pick pioto's "don't set unread if 'S'
> flag is set" on notmuch new[1].
I am sure this could be implemented with libnotmuch if it proves to
be useful.
--
martin | http://madduck.net/ | http://two.sentenc.es/
"it isn't pollution that's harming the environment.
it's the impurities in our air and water that are doing it."
- dan quayle
spamtraps: madduck.bogus@madduck.net
[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* tag dir proposal [was: Re: Git as notmuch object store]
2010-01-25 16:22 ` Mike Kelly
@ 2010-01-25 21:46 ` Jameson Rollins
2010-01-26 16:32 ` Scott Robinson
2010-01-28 5:10 ` martin f krafft
0 siblings, 2 replies; 42+ messages in thread
From: Jameson Rollins @ 2010-01-25 21:46 UTC (permalink / raw)
To: Mike Kelly, notmuch
[-- Attachment #1: Type: text/plain, Size: 2722 bytes --]
On Mon, 25 Jan 2010 11:22:47 -0500 (EST), Mike Kelly <pioto@pioto.org> wrote:
> Similarly, provide a mechanism for correlating the folder name with
> some set of tags, and change those tags as messages are moved around.
>
> For example, I might have:
>
> ~/.notmuch-config:
>
> [database]
> path=/home/pioto/mail
> ...
> [tags]
> pioto@pioto.org/INBOX.ListMail.notmuch = notmuch
>
> So, a 'tags' section, where each key is the folder name, relative to the
> db path, and the value is one or more tag names
I think this idea is a really good one and I would like to pursue it as
a tangent thread here. I was going to propose something very similar to
this. I think it's a very flexible idea that would help in a lot of
ways.
For instance, notmuch emacs (and emacs message-mode) is currently not
good at handling sent mail. At the moment mail is just Bcc'd to
yourself. However, this means that these sent messages end up back in
your inbox with 'inbox' and 'unread' tags which then need to be removed
so that the sent message is archived. If one could configure notmuch
such that only new mail in an inbox directory would be tagged with
'inbox' and 'unread', and manage to coax emacs to fcc directly into an
archive, then these sent messages would not have the problematic 'inbox'
and 'unread' tags. Even better, then sent mail could be fcc'd to a sent
mail directory would could then be configured to automatically get a
'sent' tag.
Notmuch emacs also currently does not handle message drafts, which makes
it very difficult to resume messages that were postponed from a previous
session. If notmuch could be configured to tag messages in the
message-mode "message-auto-save-directory" with a 'draft' tag, then it
would greatly facilitate finding draft messages.
It would also be sweet if this could remove tags as well (maybe be
prepending '-' or '+' to the tag specification. For example, I can
imagine implementing the above examples like this:
[database]
path=/home/jrollins/.mail
[tags]
inbox = +inbox,+unread
sent = +sent
drafts = +draft
archive = -inbox
I think we should definitely implement something like this. It would
make things a lot more flexible. Notmuch could be configured to not tag
any messages by default (which would make a lot of people using notmuch
for other backends happier) and then notmuch setup could could provide
an example tags stanza that would tag new messages with 'inbox' and
'unread' (maybe with a wildcard that would replicate the current
behavior):
[tags]
* = +inbox,+unread
I would love to see this. Hopefully we can rally some more support for
this idea.
jamie.
[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: tag dir proposal [was: Re: Git as notmuch object store]
2010-01-25 21:46 ` tag dir proposal [was: Re: Git as notmuch object store] Jameson Rollins
@ 2010-01-26 16:32 ` Scott Robinson
2010-01-26 17:03 ` Jameson Rollins
2010-01-28 5:10 ` martin f krafft
1 sibling, 1 reply; 42+ messages in thread
From: Scott Robinson @ 2010-01-26 16:32 UTC (permalink / raw)
To: notmuch
Excerpts from Jameson Rollins's message of Mon Jan 25 15:46:55 -0600 2010:
> I think this idea is a really good one and I would like to pursue it as
> a tangent thread here. I was going to propose something very similar to
> this. I think it's a very flexible idea that would help in a lot of
> ways.
>
> [...]
This is getting involved.
Maybe I'm missing something in this thread; but, why couldn't these complex and
context-sensitive decisions be delegated to sub-processes? ala git hooks?
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: tag dir proposal [was: Re: Git as notmuch object store]
2010-01-26 16:32 ` Scott Robinson
@ 2010-01-26 17:03 ` Jameson Rollins
2010-01-28 5:12 ` martin f krafft
0 siblings, 1 reply; 42+ messages in thread
From: Jameson Rollins @ 2010-01-26 17:03 UTC (permalink / raw)
To: Scott Robinson, notmuch
[-- Attachment #1: Type: text/plain, Size: 793 bytes --]
On Tue, 26 Jan 2010 10:32:02 -0600, Scott Robinson <scott@quadhome.com> wrote:
> Excerpts from Jameson Rollins's message of Mon Jan 25 15:46:55 -0600 2010:
> > I think this idea is a really good one and I would like to pursue it as
> > a tangent thread here. I was going to propose something very similar to
> > this. I think it's a very flexible idea that would help in a lot of
> > ways.
> >
> > [...]
>
> This is getting involved.
>
> Maybe I'm missing something in this thread; but, why couldn't these complex and
> context-sensitive decisions be delegated to sub-processes? ala git hooks?
I think this idea is completely independent of anything having to do
with using git as a mail store. That's why I was trying to separate it
into a separate thread.
jamie.
[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Git as notmuch object store (was: Potential problem using Git for mail)
2010-01-25 13:49 ` Sebastian Spaeth
2010-01-25 16:22 ` Mike Kelly
2010-01-25 19:49 ` Git as notmuch object store (was: Potential problem using Git for mail) martin f krafft
@ 2010-01-27 9:00 ` Sebastian Spaeth
2 siblings, 0 replies; 42+ messages in thread
From: Sebastian Spaeth @ 2010-01-27 9:00 UTC (permalink / raw)
To: martin f krafft, Asheesh Laroia; +Cc: notmuch
On Mon, 25 Jan 2010 14:49:00 +0100, "Sebastian Spaeth" <Sebastian@SSpaeth.de> wrote:
> While notmuchsync fullfils my needs, it is a kludge. It needs to call
> "notmuch" for each mail where a MailDir flag has changed (which can be
> quite often on an initial run, where most mails are likely to be read),
> this can take a long, long time. It would makes sense IMHO to at least
> pick pioto's "don't set unread if 'S' flag is set" on notmuch new[1].
Once python bindings exist for the notmuch shared library, I am sure we
can speed notmuchsync up a lot by keeping the connection open and
tagging all mails in one go rather than executing a separate binary for
each mail. So, this approach might still be feasible.
Sebastian
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: tag dir proposal [was: Re: Git as notmuch object store]
2010-01-25 21:46 ` tag dir proposal [was: Re: Git as notmuch object store] Jameson Rollins
2010-01-26 16:32 ` Scott Robinson
@ 2010-01-28 5:10 ` martin f krafft
2010-01-28 12:32 ` Servilio Afre Puentes
1 sibling, 1 reply; 42+ messages in thread
From: martin f krafft @ 2010-01-28 5:10 UTC (permalink / raw)
To: Jameson Rollins; +Cc: notmuch
[-- Attachment #1: Type: text/plain, Size: 2529 bytes --]
also sprach Jameson Rollins <jrollins@finestructure.net> [2010.01.26.1046 +1300]:
> > For example, I might have:
> >
> > ~/.notmuch-config:
> >
> > [database]
> > path=/home/pioto/mail
> > ...
> > [tags]
> > pioto@pioto.org/INBOX.ListMail.notmuch = notmuch
> >
> > So, a 'tags' section, where each key is the folder name, relative to the
> > db path, and the value is one or more tag names
>
> I think this idea is a really good one and I would like to pursue it as
> a tangent thread here. I was going to propose something very similar to
> this. I think it's a very flexible idea that would help in a lot of
> ways.
I think we need to carefully distinguish here. The above seems to
suggest a mapping from folder to tag, but we don't actually need
tags for folder locations, because those can (and should) be
implicitly determined from the database and storing the tag in
addition would just run the risk of getting out of sync: if I moved
a message, I would also have to remember to delete old and add new
tags, which is just asking for trouble.
> [tags]
> inbox = +inbox,+unread
> sent = +sent
> drafts = +draft
> archive = -inbox
This proposal, on the other hand, is an interesting one, but when is
it supposed to happen? It just feels wrong to make this happen as
part of 'notmuch new'.
What I would like to see is a notmuch-aware MDA, e.g. a programme
which reads an incoming mail on stdin and can do all this kind of
stuff, e.g. assign tags based on such rules (or take tags as
arguments, so that I could trivially set tags from procmail too),
write the message to the message store, and update the database.
This would allow us to get rid of 'notmuch new' altogether, at least
conceptually. We'd still need it if mail is being delivered
independently, e.g. with offlineimap.
On the performance side, it might make sense to write to a journal
instead of updating the database every time. SpamAssassin does this
with its Bayesian database, and it only merges the journal every
X updates (or when the user manually requests it). I am not sure
whether this is possible with Xapian. On the other hand, I think
notmuch needs to learn to journal anyway so that we can keep
different instances in sync.
--
martin | http://madduck.net/ | http://two.sentenc.es/
"the only way to get rid of a temptation is to yield to it."
-- oscar wilde
spamtraps: madduck.bogus@madduck.net
[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: tag dir proposal [was: Re: Git as notmuch object store]
2010-01-26 17:03 ` Jameson Rollins
@ 2010-01-28 5:12 ` martin f krafft
2010-01-28 5:28 ` James Westby
0 siblings, 1 reply; 42+ messages in thread
From: martin f krafft @ 2010-01-28 5:12 UTC (permalink / raw)
To: Jameson Rollins; +Cc: notmuch
[-- Attachment #1: Type: text/plain, Size: 862 bytes --]
also sprach Jameson Rollins <jrollins@finestructure.net> [2010.01.27.0603 +1300]:
> > This is getting involved.
> >
> > Maybe I'm missing something in this thread; but, why couldn't these complex and
> > context-sensitive decisions be delegated to sub-processes? ala git hooks?
>
> I think this idea is completely independent of anything having to do
> with using git as a mail store. That's why I was trying to separate it
> into a separate thread.
I think he meant "notmuch hooks like you have hooks for Git too",
e.g. thread:755741d13573c7642761d2a175cb146d
--
martin | http://madduck.net/ | http://two.sentenc.es/
"if i am occasionally a little overdressed, i make up for it by being
always immensely over-educated."
-- oscar wilde
spamtraps: madduck.bogus@madduck.net
[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: tag dir proposal [was: Re: Git as notmuch object store]
2010-01-28 5:12 ` martin f krafft
@ 2010-01-28 5:28 ` James Westby
2010-01-28 5:34 ` martin f krafft
0 siblings, 1 reply; 42+ messages in thread
From: James Westby @ 2010-01-28 5:28 UTC (permalink / raw)
To: martin f krafft, Jameson Rollins; +Cc: notmuch
On Thu, 28 Jan 2010 18:12:52 +1300, martin f krafft <madduck@madduck.net> wrote:
> also sprach Jameson Rollins <jrollins@finestructure.net> [2010.01.27.0603 +1300]:
> > > This is getting involved.
> > >
> > > Maybe I'm missing something in this thread; but, why couldn't these complex and
> > > context-sensitive decisions be delegated to sub-processes? ala git hooks?
> >
> > I think this idea is completely independent of anything having to do
> > with using git as a mail store. That's why I was trying to separate it
> > into a separate thread.
>
> I think he meant "notmuch hooks like you have hooks for Git too",
> e.g. thread:755741d13573c7642761d2a175cb146d
Are you trying to use thread: such that it could be passed to notmuch
show to see the conversation?
That's not going to work so well if so.
Thanks,
James
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: tag dir proposal [was: Re: Git as notmuch object store]
2010-01-28 5:28 ` James Westby
@ 2010-01-28 5:34 ` martin f krafft
2010-01-28 6:22 ` James Westby
2010-01-28 9:55 ` martin f krafft
0 siblings, 2 replies; 42+ messages in thread
From: martin f krafft @ 2010-01-28 5:34 UTC (permalink / raw)
To: James Westby; +Cc: notmuch
[-- Attachment #1: Type: text/plain, Size: 574 bytes --]
also sprach James Westby <jw+debian@jameswestby.net> [2010.01.28.1828 +1300]:
> Are you trying to use thread: such that it could be passed to
> notmuch show to see the conversation?
>
> That's not going to work so well if so.
Why not? Works fine for me with the vim plugin...
--
martin | http://madduck.net/ | http://two.sentenc.es/
"perfection is achieved, not when there is nothing more to add, but
when there is nothing left to take away."
-- antoine de saint-exupéry
spamtraps: madduck.bogus@madduck.net
[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: tag dir proposal [was: Re: Git as notmuch object store]
2010-01-28 5:34 ` martin f krafft
@ 2010-01-28 6:22 ` James Westby
2010-01-28 9:55 ` martin f krafft
1 sibling, 0 replies; 42+ messages in thread
From: James Westby @ 2010-01-28 6:22 UTC (permalink / raw)
To: martin f krafft; +Cc: notmuch
On Thu, 28 Jan 2010 18:34:21 +1300, martin f krafft <madduck@madduck.net> wrote:
> also sprach James Westby <jw+debian@jameswestby.net> [2010.01.28.1828 +1300]:
> > Are you trying to use thread: such that it could be passed to
> > notmuch show to see the conversation?
> >
> > That's not going to work so well if so.
>
> Why not? Works fine for me with the vim plugin...
lib/message.cc:560
static void
thread_id_generate (thread_id_t *thread_id)
{
static int seeded = 0;
FILE *dev_random;
uint32_t value;
char *s;
int i;
if (! seeded) {
dev_random = fopen ("/dev/random", "r");
if (dev_random == NULL) {
srand (time (NULL));
} else {
fread ((void *) &value, sizeof (value), 1, dev_random);
srand (value);
fclose (dev_random);
}
seeded = 1;
}
s = thread_id->str;
for (i = 0; i < NOTMUCH_THREAD_ID_DIGITS; i += 8) {
value = rand ();
sprintf (s, "%08x", value);
s += 8;
}
}
so it works fine for you, however I have no idea which thread you are
talking about.
Thanks,
James
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: tag dir proposal [was: Re: Git as notmuch object store]
2010-01-28 5:34 ` martin f krafft
2010-01-28 6:22 ` James Westby
@ 2010-01-28 9:55 ` martin f krafft
1 sibling, 0 replies; 42+ messages in thread
From: martin f krafft @ 2010-01-28 9:55 UTC (permalink / raw)
To: James Westby, Jameson Rollins, notmuch
[-- Attachment #1: Type: text/plain, Size: 627 bytes --]
also sprach martin f krafft <madduck@madduck.net> [2010.01.28.1834 +1300]:
> > That's not going to work so well if so.
>
> Why not? Works fine for me with the vim plugin...
Now I get it. I was talking about
id:20100114084713.GA22273@harikalardiyari
Sorry, I *am* new to notmuch ;)
--
martin | http://madduck.net/ | http://two.sentenc.es/
"when zarathustra was alone... he said to his heart: 'could it be
possible! this old saint in the forest hath not yet heard of it, that
god is dead!'"
- friedrich nietzsche
spamtraps: madduck.bogus@madduck.net
[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: tag dir proposal [was: Re: Git as notmuch object store]
2010-01-28 5:10 ` martin f krafft
@ 2010-01-28 12:32 ` Servilio Afre Puentes
2010-01-28 20:39 ` martin f krafft
0 siblings, 1 reply; 42+ messages in thread
From: Servilio Afre Puentes @ 2010-01-28 12:32 UTC (permalink / raw)
To: Jameson Rollins, Mike Kelly, notmuch
2010/1/28 martin f krafft <madduck@madduck.net>:
> also sprach Jameson Rollins <jrollins@finestructure.net> [2010.01.26.1046 +1300]:
>> > For example, I might have:
>> >
>> > ~/.notmuch-config:
>> >
>> > [database]
>> > path=/home/pioto/mail
>> > ...
>> > [tags]
>> > pioto@pioto.org/INBOX.ListMail.notmuch = notmuch
>> >
>> > So, a 'tags' section, where each key is the folder name, relative to the
>> > db path, and the value is one or more tag names
>>
>> I think this idea is a really good one and I would like to pursue it as
>> a tangent thread here. I was going to propose something very similar to
>> this. I think it's a very flexible idea that would help in a lot of
>> ways.
>
> I think we need to carefully distinguish here. The above seems to
> suggest a mapping from folder to tag, but we don't actually need
> tags for folder locations because those can (and should) be implicitly
> determined from the database
I think that the usefulness of this functionality is that we can have
a mapping from physical organization of the mail to a tagging scheme
of our choosing, and we can be relieved from having to remember the
location of the mail (that can be different in different from
different mail clients).
But even right now I can't find a documented way of searching by
location, so AFAIK the implementation of this proposal would allow
something that is not possible at the moment.
>> [tags]
>> inbox = +inbox,+unread
>> sent = +sent
>> drafts = +draft
>> archive = -inbox
>
> This proposal, on the other hand, is an interesting one, but when is
> it supposed to happen? It just feels wrong to make this happen as
> part of 'notmuch new'.
Why so?
> What I would like to see is a notmuch-aware MDA, e.g. a programme
> which reads an incoming mail on stdin and can do all this kind of
> stuff, e.g. assign tags based on such rules (or take tags as
> arguments, so that I could trivially set tags from procmail too),
> write the message to the message store, and update the database.
Such an MDA wouldn't need to use "notmuch new", and thus won't be
affected by this
> This would allow us to get rid of 'notmuch new' altogether, at least
> conceptually. We'd still need it if mail is being delivered
> independently, e.g. with offlineimap.
Then we'd still need it, why not make it better?
Regards,
Servilio
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: tag dir proposal [was: Re: Git as notmuch object store]
2010-01-28 12:32 ` Servilio Afre Puentes
@ 2010-01-28 20:39 ` martin f krafft
2010-01-28 20:49 ` Ben Gamari
0 siblings, 1 reply; 42+ messages in thread
From: martin f krafft @ 2010-01-28 20:39 UTC (permalink / raw)
To: notmuch
[-- Attachment #1: Type: text/plain, Size: 807 bytes --]
also sprach Servilio Afre Puentes <servilio@gmail.com> [2010.01.29.0132 +1300]:
> >> [tags]
> >> inbox = +inbox,+unread
> >> sent = +sent
> >> drafts = +draft
> >> archive = -inbox
> >
> > This proposal, on the other hand, is an interesting one, but when is
> > it supposed to happen? It just feels wrong to make this happen as
> > part of 'notmuch new'.
>
> Why so?
I guess I just dislike having to run notmuch new regularly, rather
than integrating the database more closely with the mail flow.
--
martin | http://madduck.net/ | http://two.sentenc.es/
"to get back my youth i would do anything in the world, except take
exercise, get up early, or be respectable."
-- oscar wilde
spamtraps: madduck.bogus@madduck.net
[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: tag dir proposal [was: Re: Git as notmuch object store]
2010-01-28 20:39 ` martin f krafft
@ 2010-01-28 20:49 ` Ben Gamari
2010-01-28 21:11 ` martin f krafft
2010-01-28 21:16 ` Jed Brown
0 siblings, 2 replies; 42+ messages in thread
From: Ben Gamari @ 2010-01-28 20:49 UTC (permalink / raw)
To: notmuch
Excerpts from martin f krafft's message of Thu Jan 28 15:39:10 -0500 2010:
> also sprach Servilio Afre Puentes <servilio@gmail.com> [2010.01.29.0132 +1300]:
> > >> [tags]
> > >> inbox = +inbox,+unread
> > >> sent = +sent
> > >> drafts = +draft
> > >> archive = -inbox
> > >
> > > This proposal, on the other hand, is an interesting one, but when is
> > > it supposed to happen? It just feels wrong to make this happen as
> > > part of 'notmuch new'.
> >
> > Why so?
>
> I guess I just dislike having to run notmuch new regularly, rather
> than integrating the database more closely with the mail flow.
>
Sounds like you need to add a line to crontab.
- Ben
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: tag dir proposal [was: Re: Git as notmuch object store]
2010-01-28 20:49 ` Ben Gamari
@ 2010-01-28 21:11 ` martin f krafft
[not found] ` <1264713802-sup-620@ben-laptop>
2010-01-28 21:16 ` Jed Brown
1 sibling, 1 reply; 42+ messages in thread
From: martin f krafft @ 2010-01-28 21:11 UTC (permalink / raw)
To: Ben Gamari; +Cc: notmuch
[-- Attachment #1: Type: text/plain, Size: 788 bytes --]
also sprach Ben Gamari <bgamari@gmail.com> [2010.01.29.0949 +1300]:
> > I guess I just dislike having to run notmuch new regularly,
> > rather than integrating the database more closely with the mail
> > flow.
> >
> Sounds like you need to add a line to crontab.
It still feels like a hack. It's a bit like making many changes to
a source code repository (new mails get delivered) and committing
only once every hour (notmuch new), rather than making and
committing transactional changes (delivering and catalogueing mails
individually).
--
martin | http://madduck.net/ | http://two.sentenc.es/
a Hooloovoo is a superintelligent shade of the color blue.
-- douglas adams, "the hitchhiker's guide to the galaxy"
spamtraps: madduck.bogus@madduck.net
[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: tag dir proposal [was: Re: Git as notmuch object store]
2010-01-28 20:49 ` Ben Gamari
2010-01-28 21:11 ` martin f krafft
@ 2010-01-28 21:16 ` Jed Brown
1 sibling, 0 replies; 42+ messages in thread
From: Jed Brown @ 2010-01-28 21:16 UTC (permalink / raw)
To: Ben Gamari, notmuch
On Thu, 28 Jan 2010 15:49:34 -0500, Ben Gamari <bgamari@gmail.com> wrote:
> Sounds like you need to add a line to crontab.
I haven't been following this thread closely so I hope this isn't too
out of context. I agree that certain things like notmuch-new should go
in the crontab, but I think that notmuch-new should need to be run
exactly once to process a new batch of messages into the desired state.
Having notmuch-new apply one set of tags and then relying on another
process run afterwards to change the tags according to a filter is
undesirable in my opinion, both for the mild performance reason of
making two passes, but more importantly because of lock contention
between the two processes and the ease of viewing the database in the
inconsistent state. As far as I understand the situation, my favorite
solution is to have notmuch-new run a hook on each message as it is
indexed.
Jed
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: tag dir proposal [was: Re: Git as notmuch object store]
[not found] ` <20100128221735.GE8942@lapse.rw.madduck.net>
@ 2010-01-28 23:30 ` Ben Gamari
0 siblings, 0 replies; 42+ messages in thread
From: Ben Gamari @ 2010-01-28 23:30 UTC (permalink / raw)
To: martin f krafft, notmuch
Excerpts from martin f krafft's message of Thu Jan 28 17:17:35 -0500 2010:
> Cron-scheduling is a regular activity. I am talking about
> event-based scheduling. incron could do that and fire up a process
> every time a message is dropped into a directory, but notmuch
> doesn't provide me with an interface to say "you don't have to
> iterate the Maildir yourself since I know exactly what changed: just
> update your catalog with the new message in file foo/bar.msg".
Fair enough. After reading your arguments I think I might have initially
misunderstood you. I would actually tend to agree. Passing
paths to notmuch does seem to be a reasonable approach.
>
> To me, notmuch-new is not Unix-y. To me,
>
> find $MAILDIR -type f -print0 | xargs -0 notmuch-update
>
> is Unix-y. ;)
>
I think it really depends upon what you are doing. I can certainly see
when you might be want to simply have notmuch synchronize the index
against the mail store. However, it seems the majority of the time one
simply desires to add a message to the index (i.e. after delivery).
Therefore, it seems like there is a place for both commands.
> > In my configuration, I simply have a bash script in ~/.bin that simply
> > runs offlineimap followed by notmuch new. This works quite nicely.
>
> This is essentially the same situation as with slocate, which has to
> be run from cron currently, and hence gets outdated regularly.
> Compare this to a hypothetical filesystem that exposed an index of
> filenames (or content even!) to user-space, which could be used to
> quickly search for files in real-time without the need to run
> regular updates. I know other operating systems that have this
> functionality already.
>
> Anyway, this is going off on a tangent, I feel.
>
That might be true but that certainly won't stop. ;) One would think
that it wouldn't be difficult to teach slocate about inotify. I briefly
looked into this and found rlocate but quickly realized that it requires
its own kernel module. Apparently this has been investigated[1] and
the inotify watch count limit becomes an issue very quickly. I seem to
recall, however, that there were some whispers on the LKML about adding
an interface that would be more capable of supporting such a system. I
can't seem to recall the details, however, and homework beckons.
Cheers,
- Ben
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Git as notmuch object store (was: Potential problem using Git for mail)
2010-01-25 0:46 ` Git as notmuch object store (was: Potential problem using Git for mail) martin f krafft
2010-01-25 5:19 ` Asheesh Laroia
2010-01-25 13:49 ` Sebastian Spaeth
@ 2010-02-15 0:51 ` Stewart Smith
2 siblings, 0 replies; 42+ messages in thread
From: Stewart Smith @ 2010-02-15 0:51 UTC (permalink / raw)
To: Asheesh Laroia, notmuch
On Mon, Jan 25, 2010 at 01:46:59PM +1300, martin f krafft wrote:
> Stewart, you've worked most on this so far. Would you like to share
> your thoughts?
Just posted a new thread with my latest experiments. Things look
rather good from a storage size point of view. Still a few things to
work out though.
--
Stewart Smith
^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2010-02-15 0:51 UTC | newest]
Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-11 22:19 Idea for storing tags martin f krafft
2010-01-12 3:44 ` Scott Robinson
2010-01-12 4:06 ` martin f krafft
2010-01-12 4:51 ` Potential problem using Git for mail (was: Idea for storing tags) martin f krafft
2010-01-12 19:38 ` Jameson Rollins
2010-01-12 19:55 ` martin f krafft
2010-01-14 8:12 ` Asheesh Laroia
2010-01-14 20:37 ` martin f krafft
2010-01-21 6:28 ` Asheesh Laroia
2010-01-25 0:46 ` Git as notmuch object store (was: Potential problem using Git for mail) martin f krafft
2010-01-25 5:19 ` Asheesh Laroia
2010-01-25 7:43 ` martin f krafft
2010-01-25 13:49 ` Sebastian Spaeth
2010-01-25 16:22 ` Mike Kelly
2010-01-25 21:46 ` tag dir proposal [was: Re: Git as notmuch object store] Jameson Rollins
2010-01-26 16:32 ` Scott Robinson
2010-01-26 17:03 ` Jameson Rollins
2010-01-28 5:12 ` martin f krafft
2010-01-28 5:28 ` James Westby
2010-01-28 5:34 ` martin f krafft
2010-01-28 6:22 ` James Westby
2010-01-28 9:55 ` martin f krafft
2010-01-28 5:10 ` martin f krafft
2010-01-28 12:32 ` Servilio Afre Puentes
2010-01-28 20:39 ` martin f krafft
2010-01-28 20:49 ` Ben Gamari
2010-01-28 21:11 ` martin f krafft
[not found] ` <1264713802-sup-620@ben-laptop>
[not found] ` <20100128221735.GE8942@lapse.rw.madduck.net>
2010-01-28 23:30 ` Ben Gamari
2010-01-28 21:16 ` Jed Brown
2010-01-25 19:49 ` Git as notmuch object store (was: Potential problem using Git for mail) martin f krafft
2010-01-27 9:00 ` Sebastian Spaeth
2010-02-15 0:51 ` Stewart Smith
2010-01-12 4:11 ` Idea for storing tags Scott Morrison
2010-01-13 1:24 ` martin f krafft
2010-01-13 5:39 ` Scott Morrison
2010-01-13 5:52 ` martin f krafft
2010-01-14 1:37 ` Carl Worth
2010-01-12 21:39 ` David A. Harding
2010-01-14 1:32 ` Carl Worth
2010-01-14 8:04 ` martin f krafft
2010-01-14 22:24 ` Carl Worth
2010-01-14 22:32 ` martin f krafft
Code repositories for project(s) associated with this public inbox
https://yhetil.org/notmuch.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).