hi notmuch folks-- i've been trying to wrap my head around how to get notmuch to support verifying cryptographically-signed mail. i'm afraid my current understanding of the problem space is that it is neither pretty nor clean. Sorry for the length of this message. Scope: ------ I'm focusing initially here only on verifying PGP/MIME cleartext signatures. I'm proposing to do the verification in the backend, and to report on the validity of the signatures to the frontend through "notmuch show --format=json" (ignoring the other output formats for now). This mail is only trying to explain how the JSON format might communicate this information from the backend to the frontend. (implementation will happen depending on the followup discussion, but i don't mean for implementation questions to derail this first) Proposal: --------- No attempt to actually validate the signatures will be made unless the new --verify flag is passed to "notmuch show". A signed MIME part will contain a new element "signedby", which is a list of part numbers identifying signatures that cover this part. Signature parts (Content-Type: application/pgp-signature) will contain a new element "signs", which points back to the list of parts this signature covers. It will also contain a "sigstatus" member, which is a list of objects, each of which contain at least the following element: * "verified" -- one of the following values: "success" (the sig has been tested and is cryptographically valid) "failure" (the sig has been tested and does not match) "nokey" (the sig could not be tested because pubkey is missing) "error" (testing the sig failed for some other reason) "unknown" (testing was not tried) If "verified" is "success" in a "sigstatus" object, then the following fields might also be present: * "signingkey" -- hexadecimal representation of 160-bit fingerprint of the signing key * "digest" -- the hash over which the sig was made (e.g. "SHA1") * "timestamp" -- the time the signature claims to have been made (let me know what format i should represent this in) * "pubkeyalgo" -- the signing key's asymmetric algorithm (e.g. "RSA") * "expires" -- if the signature has an expiration date, it goes here Example: -------- currently, the "body" element of a PGP/MIME signed message looks like this with --format=json: --------------------------- "body": [ { "content": "here is a test message i signed on 2010-11-11.\n\n --dkg\n\n", "content-type": "text/plain", "id": 1 }, { "content-type": "application/pgp-signature", "filename": "signature.asc", "id": 2 } ], --------------------------- It would end up like this (without the --verify flag): --------------------------- "body": [ { "content": "here is a test message i signed on 2010-11-11.\n\n --dkg\n\n", "content-type": "text/plain", "id": 1, "signedby": [ 2 ] }, { "content-type": "application/pgp-signature", "filename": "signature.asc", "id": 2, "signs": [ 1 ], "sigstatus": [ { "verified": "unknown" } ] } ], --------------------------- and here it is with the --verify flag: --------------------------- "body": [ { "content": "here is a test message i signed on 2010-11-11.\n\n --dkg\n\n", "content-type": "text/plain", "id": 1, "signedby": [ 2 ] }, { "content-type": "application/pgp-signature", "filename": "signature.asc", "id": 2, "signs": [ 1 ], "sigstatus": [ { "verified": "success", "signingkey": "0EE5BE979282D80B9F7540F1CCD2ED94D21739E9", "digest": "SHA512", "timestamp": "2010-11-11 22:32:45 -0400", "pubkeyalgo": "RSA" } ] } ], --------------------------- Observations: ------------- i'm not covering key->userid bindings in this first pass -- it's already complicated enough to say "the following key did actually sign this message part". I'm still not sure whether the front-end or the backend should be responsible for resolution of key->userid bindings, but i'm OK punting on that question for the moment. Multipart messages can have some parts signed and other parts not signed: think of mailing lists which tack on a footer to each relayed mail; the footer isn't signed, though the rest of the message is. One MIME signature can cover more than one MIME part: Think of a signed e-mail with an attachment. In this case, the signature is actually over the aggregate, not the individual parts. For example, a signed two-part message that says: [ (A) "this is the budget for 2011", and (B) an attached spreadsheet ] is *not* the same as either (A) or (B) signed independently. A multipart MIME message can contain more than one distinct signature on different parts: Think of a digest of a mailing list discussion between several participants who each sign their own messages. Each signature needs to be bound to the relevant parts (and vice versa); and some signatures within a message can fail while others succeed. A single application/pgp-signature part could contain signing material from multiple signers. Think of a PGP/MIME-signed key transition document. MIME is actually a tree structure, and any subtree can be signed. But currently, "notmuch show" hides the tree structure and produces what appears to be a linear set of parts. Even more perversely, the tree structure means that a single MIME part could potentially be signed by multiple signatures, each of which potentially has independent origin and independent validity. I've attached a moderately nasty e-mail message to this one demonstrating a confluence of a bunch of these observations. The structure of the attached e-mail looks like this: A└┬╴multipart/signed 10936 bytes B ├┬╴multipart/mixed 7403 bytes C │├╴text/plain 77 bytes D │├╴image/jpeg attachment [dkg.jpg] 4753 bytes E │└┬╴message/rfc822 2072 bytes F │ └┬╴multipart/signed 1914 bytes G │ ├╴text/plain 57 bytes H │ └╴application/pgp-signature attachment [signature.asc] 900 bytes I └╴application/pgp-signature attachment [signature.asc] 900 bytes "notmuch show" emits it as 5 parts (omitting A, B, E, and F): 1: C 2: D 3: G 4: H 5: I Note that while C and D are both signed by I, G is actually signed by both H and I. yuck. And since this example message is attached to the e-mail i'm writing right now (which itself will be signed) it can certainly get even yuckier. Questions: ---------- Am i missing any data or relationships you think we might want? Is anything broken, unexpected, or dangerous about the choice of JSON modifications? I realize i've gone down a bit of a rabbit hole in the corner cases here (driven mainly by my observations section). Are there any simplifying assumptions we can safely make about what kinds of messages are worth verifying? That is, are there ways to make this more intelligible that don't throw away our ability to accurately represent the verified state of some non-trivial subset of messages? If this method (or something similar to it) gets put into the notmuch backend, is this something we can actually represent to a human with a reasonable frontend? Would it make more sense to do deeper structural modifications of the json output (e.g. return the full MIME tree instead of a list of parts) than to go with the current proposal? It would be nice to also make this kind of reporting structure also work for S/MIME and maybe other crypto-signature structures like DKIM. Is that doable within this framework? are there other tweaks we might want to consider to cover that possibility? If you actually read this far, you are a champion! I look forward to any feedback you have. OK, off to bed! --dkg