* feature request: caching message arrival time @ 2019-06-01 3:29 Daniel Kahn Gillmor 2019-06-01 14:13 ` David Bremner 2019-06-01 14:19 ` Ralph Seichter 0 siblings, 2 replies; 10+ messages in thread From: Daniel Kahn Gillmor @ 2019-06-01 3:29 UTC (permalink / raw) To: Notmuch Mail [-- Attachment #1: Type: text/plain, Size: 1890 bytes --] Hi Notmuch folks-- I'm working on Autocrypt integration for notmuch right now, and it occurs to me that it might be useful to know the time that any given message was first seen by notmuch. I'm trying to not get distracted by implementing such a feature, but I wanted to log this as a feature request, along with a few thoughts about it. My idea is that the first time notmuch indexes a message, it would add a property to the message like firstseen=2019-05-31T23:15:24Z. Some nuances spring to mind: * This should *not* be cleared and reset on reindexing, so it doesn't belong in the index.* property namespace. * What happens when you delete a message? Maybe we should keep that value around for "ghosts" too -- can ghost documents have properties? Or is it bad to remember that we've seen the message if someone deletes it? * When even the ghost goes away (e.g. full thread deletion), presumably this property would go away. So If you deleted the message from your message store, notmuch would forget about it, and then the next time you ingest it it would get a later "firstseen=" property. I'm ok with this. * i don't think we have a way to search properties by range (e.g. the way that we can search date ranges). i don't need that feature for my use case, but maybe someone will come up with a use case that wants it? is there a way to store the datestamp in a way that it can be scanned the way that "date" can? or do we already have this and i'm just unaware? * What is the cost in terms of database size? It doesn't look like it would be expensive to me, but i haven't profiled it. * if we make such a change, how should we deal with already-indexed messages? Anyone have any thoughts, suggestions, or objections to this? I'm happy to explain more about my use case if people are interested too. --dkg [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 227 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: feature request: caching message arrival time 2019-06-01 3:29 feature request: caching message arrival time Daniel Kahn Gillmor @ 2019-06-01 14:13 ` David Bremner 2019-06-01 14:19 ` Ralph Seichter 1 sibling, 0 replies; 10+ messages in thread From: David Bremner @ 2019-06-01 14:13 UTC (permalink / raw) To: Daniel Kahn Gillmor, Notmuch Mail Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes: > * i don't think we have a way to search properties by range (e.g. the > way that we can search date ranges). i don't need that feature for > my use case, but maybe someone will come up with a use case that > wants it? is there a way to store the datestamp in a way that it can > be scanned the way that "date" can? or do we already have this and > i'm just unaware? you'd need to use a value slot to get (native Xapian) range searches. To quote the xapian docs For performance it is important to keep the amount of data stored in the values to a minimum, since the values for a large number of documents may be read during the search - the more data that has to be read, the slower the search will be. So it's definitely something that would need to be profiled. Probably the patches that added lastmod: are a good example for someone wanting to investigate this. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: feature request: caching message arrival time 2019-06-01 3:29 feature request: caching message arrival time Daniel Kahn Gillmor 2019-06-01 14:13 ` David Bremner @ 2019-06-01 14:19 ` Ralph Seichter 2019-06-01 15:30 ` Daniel Kahn Gillmor 1 sibling, 1 reply; 10+ messages in thread From: Ralph Seichter @ 2019-06-01 14:19 UTC (permalink / raw) To: notmuch * Daniel Kahn Gillmor: > I'm working on Autocrypt integration for notmuch right now [...] Woot! :-) > I'm happy to explain more about my use case if people are interested > too. I'm interested. Right now I frankly don't know what knowing when a message was first seen by Notmuch might be useful for. That makes it a bit difficult for me to contemplate your questions. -Ralph ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: feature request: caching message arrival time 2019-06-01 14:19 ` Ralph Seichter @ 2019-06-01 15:30 ` Daniel Kahn Gillmor 2019-06-03 8:57 ` Örjan Ekeberg 0 siblings, 1 reply; 10+ messages in thread From: Daniel Kahn Gillmor @ 2019-06-01 15:30 UTC (permalink / raw) To: Ralph Seichter, notmuch [-- Attachment #1: Type: text/plain, Size: 6093 bytes --] On Sat 2019-06-01 16:19:19 +0200, Ralph Seichter wrote: > I'm interested. Right now I frankly don't know what knowing when a > message was first seen by Notmuch might be useful for. That makes it > a bit difficult for me to contemplate your questions. Sure, thanks for asking! As i went to write this down, it became a lot longer than i'd expected. sorry about that! On the positive side, i may have convinced myself in the process that the threat this mechanism would defend against is small enough that it may not be worth the additional implementation (though if the implementation were there, we'd certainly want to use it). So, this is a story about Autocrypt state, out-of-order delivery, and e-mails with suspicious date stamps ("from the future"). (if you're reading this message haven't been following Autocrypt closely, you can read up at https://www.autocrypt.org/) ------ When receiving an e-mail sent From: the peer foo@example.org, an Autocrypt-capable client needs to update the Autocrypt state for that peer's e-mail address ("foo@example.org"). This is the case for messages that have an Autocrypt: header *and* for messages that *don't* have one. Both kinds of messages update the Autocrypt peer state, because if you start receiving Autocrypt-free messages from someone who used Autocrypt in the past, your client needs to make a note of that and consider it when it makes its recommendation for new outbound messages to that peer. Additionally, sometimes we receive e-mail messages out of order. sometimes this is because we're suddenly running across a cache of old messages, sometimes it's because we've just popped online after a day off, and sometimes it's because SMTP had a hiccup (there are probably many other reasons). We also probably don't want to store state about everyone who has ever sent us mail *without* using Autocrypt. At the moment, at least, that's probably most senders, and it's both a waste of space and a potential privacy concern to record a lot of empty state that just indicates that you got mail from someone at some point in the past. So if we've never seen an Autocrypt header from a given peer, there's no state to update. So now consider the following set of e-mail messages all from the same sender; mails with a * have an Autocrypt header, and the times following the message indicates its Date: header in an abstract way (higher numbers are later than lower numbers). A: (time 1) B*: (time 2) C: (time 3) Let's assume that i update Autocrypt state about the peer upon receipt of each message, regardless of what order the messages were sent. We want the Autocrypt state to be immutable, independent of the order of delivery. If i receive them at times 4, 5, and 6 in order (A, B, C) then i'll think that the Autocrypt state for the peer is "we had an Autocrypt header earlier (from B), but a more recent delivery (C) suggests that they might not be using Autocrypt reliably" (depending on the actual difference in time between the Date:s of B and C, the peer might end up with an Autocrypt recommendation called "discourage"). This is the correct state for us to end up in. But now imagine that at times 4, 5, and 6 i receive the messages in the order A, C, B. If i don't store Autocrypt state for the peer at times 4 and 5, because i've never seen an Autocrypt header for the peer before, and there is none in messages A and C. Then my end result is that i'll think that the Autocrypt state for the peer is just the Autocrypt header from B. But that's it's different from what we ended up with when we received the messages in order. Now, we can improve on this with the following extra technique: when a peer goes from no Autocrypt state to having an Autocrypt state, we can search the existing index for messages from that peer with a later Date: header. If we find such a message, then we should include it in our calculations. If we do that, then we end up with the correct state, regardless of the order of delivery. good! So far, we haven't needed the firstseen= property yet. There's one final wrinkle that introduces the need for it: message Date: headers can be wrong. They can even be grossly wrong -- they can be from the future. This can happen when the sender's clock is bad, mainly, but it can also happen through malice (someone wanting to forge a message to mess with the receipient's state about a given peer, for example). So Autocrypt defines the "effective date" of a message as the *earliest* of two dates: the date that the message is first seen, and the Date: header itself. So we want our augmented Autocrypt header ingestion routine to search for all other messages we know about from the sender that have both a later firstseen= property *and* a later Date: header. Otherwise, one poorly formed e-mail without an Autocrypt header with the Date: set to the year 3000 (the "bogus future message") would make it so that the peer's recommendation would be set to "discourage" when a message that contains an Autocrypt: header first comes in. Conclusion ---------- Upon writing all this down, perhaps that's not such a troubling threat. Having such a bogus future message stored in the database would indeed leave the peer with a "discouraged" Autocrypt state upon receipt of the first Autocrypt: header. But if that database search only happens upon the first Autocrypt: header seen, then a second message from the same peer would clear the "discouraged" recommendation without consulting the bogus future message at all. So if the threat of a bogus future message is overcome by just a single additional Autocrypt-enabled message from the same peer, that's not particularly bad. And "bogus future message" probably isn't all that likely either. So this isn't very high on my list of priorities after all, though if such a lastseen property were available, i'd definitely use it to improve the Autocrypt experience in this minor way. --dkg [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 227 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: feature request: caching message arrival time 2019-06-01 15:30 ` Daniel Kahn Gillmor @ 2019-06-03 8:57 ` Örjan Ekeberg 2019-06-03 13:17 ` Daniel Kahn Gillmor 0 siblings, 1 reply; 10+ messages in thread From: Örjan Ekeberg @ 2019-06-03 8:57 UTC (permalink / raw) To: Daniel Kahn Gillmor, Ralph Seichter, notmuch Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes: > So Autocrypt defines the "effective date" of a message as the *earliest* > of two dates: the date that the message is first seen, and the Date: > header itself. So we want our augmented Autocrypt header ingestion > routine to search for all other messages we know about from the sender > that have both a later firstseen= property *and* a later Date: header. Would it be possible to use the earliest date seen in any of the Received: headers as a safeguard against future-dated messages? /Örjan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: feature request: caching message arrival time 2019-06-03 8:57 ` Örjan Ekeberg @ 2019-06-03 13:17 ` Daniel Kahn Gillmor 2019-06-03 14:02 ` Ralph Seichter 2019-06-03 16:02 ` Örjan Ekeberg 0 siblings, 2 replies; 10+ messages in thread From: Daniel Kahn Gillmor @ 2019-06-03 13:17 UTC (permalink / raw) To: Örjan Ekeberg, Ralph Seichter, notmuch [-- Attachment #1: Type: text/plain, Size: 1962 bytes --] On Mon 2019-06-03 10:57:15 +0200, Örjan Ekeberg wrote: > Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes: > >> So Autocrypt defines the "effective date" of a message as the *earliest* >> of two dates: the date that the message is first seen, and the Date: >> header itself. So we want our augmented Autocrypt header ingestion >> routine to search for all other messages we know about from the sender >> that have both a later firstseen= property *and* a later Date: header. > > Would it be possible to use the earliest date seen in any of the > Received: headers as a safeguard against future-dated messages? Sure, assuming that you trust the closest MTA in the chain of MTAs that handed the message off to you, since an adversarial proximal MTA could manipulate all the existing Received: headers as well. But I'm a bit uncomfortable with it: this sort of protection actually opens up a new attack vector that didn't exist before -- any MTA in the chain can now make the message seem like it was actually from the *past*, just by setting its own Received: header. Technically, of course, any MTA could munge the actual Date: header as well to perform this kind of attack, but that munging would at least have the potential to be detected by anyone who cares to verify DKIM headers; but Received: headers are impossible to cover with DKIM. If there was no expense to the indexing and storage, i'd say it would be good to just go ahead and index the earliest Received: header as well, to have that data trivially available as a data point in evaluating incoming messages. But since it sounds like there's a cost (in performance and storage) that would need to be profiled, i don't know that i can say it's worth the tradeoff. Since notmuch actually knows when it recieved the message, it seems like it would be simplest (and less vulnerable to manipulation) to just record that timestamp directly. --dkg [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 227 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: feature request: caching message arrival time 2019-06-03 13:17 ` Daniel Kahn Gillmor @ 2019-06-03 14:02 ` Ralph Seichter 2019-06-03 22:16 ` Daniel Kahn Gillmor 2019-06-03 16:02 ` Örjan Ekeberg 1 sibling, 1 reply; 10+ messages in thread From: Ralph Seichter @ 2019-06-03 14:02 UTC (permalink / raw) To: notmuch * Daniel Kahn Gillmor: > Since notmuch actually knows when it recieved the message [...] Not meaning to complicate things, but Notmuch does not receive messages at all. ;-) One needs to rely on some software to populate the Maildir tree (Dovecot LMTP in my case, Postfix or some other MTA for local delivery in other cases). Any software transporting the raw messages can, and sometimes must, manipulate the header data, and the order in which files within the Maildir tree are created is also not determined by Notmuch. As an example: My nightly backup script disables local delivery for the duration of the backup process. Once reactivated, delivery of queued messages resumes, but it is not guaranteed to happen in the order of arrival. So even the local MTA, although trusted, might induce issues in terms of delivery time. -Ralph ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: feature request: caching message arrival time 2019-06-03 14:02 ` Ralph Seichter @ 2019-06-03 22:16 ` Daniel Kahn Gillmor 0 siblings, 0 replies; 10+ messages in thread From: Daniel Kahn Gillmor @ 2019-06-03 22:16 UTC (permalink / raw) To: Ralph Seichter, notmuch [-- Attachment #1: Type: text/plain, Size: 1354 bytes --] On Mon 2019-06-03 16:02:48 +0200, Ralph Seichter wrote: > Not meaning to complicate things, but Notmuch does not receive messages > at all. ;-) One needs to rely on some software to populate the Maildir > tree (Dovecot LMTP in my case, Postfix or some other MTA for local > delivery in other cases). Any software transporting the raw messages > can, and sometimes must, manipulate the header data, and the order in > which files within the Maildir tree are created is also not determined > by Notmuch. > > As an example: My nightly backup script disables local delivery for the > duration of the backup process. Once reactivated, delivery of queued > messages resumes, but it is not guaranteed to happen in the order of > arrival. So even the local MTA, although trusted, might induce issues in > terms of delivery time. I agree with you! the e-mail system, like any other store-and-forward ecosystem, offers no guarantees of message delivery. fwiw, i'm not claiming that the time notmuch receives the message is guaranteed to be close to the time that the message was sent. but i can guarantee two things: * notmuch cannot receive the message *before* it was sent :) * if the local system clock is correct, notmuch can place a plausible upper bound on the Date: header that is included in the message. This alone is useful data. --dkg [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 227 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: feature request: caching message arrival time 2019-06-03 13:17 ` Daniel Kahn Gillmor 2019-06-03 14:02 ` Ralph Seichter @ 2019-06-03 16:02 ` Örjan Ekeberg 2019-06-03 22:21 ` Daniel Kahn Gillmor 1 sibling, 1 reply; 10+ messages in thread From: Örjan Ekeberg @ 2019-06-03 16:02 UTC (permalink / raw) To: Daniel Kahn Gillmor, Ralph Seichter, notmuch Daniel Kahn Gillmor <dkg@fifthhorseman.net> writes: > Sure, assuming that you trust the closest MTA in the chain of MTAs that > handed the message off to you, since an adversarial proximal MTA could > manipulate all the existing Received: headers as well. > > But I'm a bit uncomfortable with it: this sort of protection actually > opens up a new attack vector that didn't exist before -- any MTA in the > chain can now make the message seem like it was actually from the > *past*, just by setting its own Received: header. As far as I understand the autocrypt protocol (i.e. not much;-) ), the vulnerability is that an incoming message with a later time-stamp than the locally saved autocrypt status can update the stored state (e.g. turn off encryption). Manipulating the time-stamp to make the message appear to be *older* than it really is should only mean that it is less likely to update the saved state? If this is correct, using the oldest of all the time-stamps seen in the Date-header and any of the Received-headers should be the most defensive. /Örjan ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: feature request: caching message arrival time 2019-06-03 16:02 ` Örjan Ekeberg @ 2019-06-03 22:21 ` Daniel Kahn Gillmor 0 siblings, 0 replies; 10+ messages in thread From: Daniel Kahn Gillmor @ 2019-06-03 22:21 UTC (permalink / raw) To: Örjan Ekeberg, Ralph Seichter, notmuch [-- Attachment #1: Type: text/plain, Size: 1129 bytes --] On Mon 2019-06-03 18:02:53 +0200, Örjan Ekeberg wrote: > As far as I understand the autocrypt protocol (i.e. not much;-) ), the > vulnerability is that an incoming message with a later time-stamp than > the locally saved autocrypt status can update the stored state > (e.g. turn off encryption). Manipulating the time-stamp to make the > message appear to be *older* than it really is should only mean that it is > less likely to update the saved state? > > If this is correct, using the oldest of all the time-stamps seen in the > Date-header and any of the Received-headers should be the most > defensive. It's the most defensive against one form of attack: forging e-mails intended to update the user's Autocrypt state about a given peer. But another form of attack is also possible: convincing the user to *not* update their Autocrypt state about a given peer, while leaving the original message otherwise plausible and intact, thereby raising no suspicions about delivery problems. I'd like notmuch's Autocrypt implementation to try to defend against either attack where possible. --dkg [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 227 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-06-04 8:15 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-06-01 3:29 feature request: caching message arrival time Daniel Kahn Gillmor 2019-06-01 14:13 ` David Bremner 2019-06-01 14:19 ` Ralph Seichter 2019-06-01 15:30 ` Daniel Kahn Gillmor 2019-06-03 8:57 ` Örjan Ekeberg 2019-06-03 13:17 ` Daniel Kahn Gillmor 2019-06-03 14:02 ` Ralph Seichter 2019-06-03 22:16 ` Daniel Kahn Gillmor 2019-06-03 16:02 ` Örjan Ekeberg 2019-06-03 22:21 ` Daniel Kahn Gillmor
Code repositories for project(s) associated with this public inbox https://yhetil.org/notmuch.git/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).