From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 91EF2429E54 for ; Mon, 23 Jan 2012 17:16:43 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.7 X-Spam-Level: X-Spam-Status: No, score=-0.7 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_LOW=-0.7] autolearn=disabled Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1OSc7-RqkQVq for ; Mon, 23 Jan 2012 17:16:42 -0800 (PST) Received: from dmz-mailsec-scanner-6.mit.edu (DMZ-MAILSEC-SCANNER-6.MIT.EDU [18.7.68.35]) by olra.theworths.org (Postfix) with ESMTP id 29E7D429E21 for ; Mon, 23 Jan 2012 17:16:42 -0800 (PST) X-AuditID: 12074423-b7f9c6d0000008c3-4e-4f1e0679fb8c Received: from mailhub-auth-3.mit.edu ( [18.9.21.43]) by dmz-mailsec-scanner-6.mit.edu (Symantec Messaging Gateway) with SMTP id 05.69.02243.9760E1F4; Mon, 23 Jan 2012 20:16:41 -0500 (EST) Received: from outgoing.mit.edu (OUTGOING-AUTH.MIT.EDU [18.7.22.103]) by mailhub-auth-3.mit.edu (8.13.8/8.9.2) with ESMTP id q0O1GedT014578; Mon, 23 Jan 2012 20:16:41 -0500 Received: from awakening.csail.mit.edu (awakening.csail.mit.edu [18.26.4.91]) (authenticated bits=0) (User authenticated as amdragon@ATHENA.MIT.EDU) by outgoing.mit.edu (8.13.6/8.12.4) with ESMTP id q0O1GdBM021925 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NOT); Mon, 23 Jan 2012 20:16:40 -0500 (EST) Received: from amthrax by awakening.csail.mit.edu with local (Exim 4.77) (envelope-from ) id 1RpUzZ-0004Nr-Aw; Mon, 23 Jan 2012 20:16:09 -0500 Date: Mon, 23 Jan 2012 20:16:09 -0500 From: Austin Clements To: Mark Walters Subject: Re: [PATCH] Automatically exclude tags in notmuch-show Message-ID: <20120124011609.GX16740@mit.edu> References: <874nvric7c.fsf@qmul.ac.uk> <1327010583-23954-1-git-send-email-markwalters1009@gmail.com> <20120119225910.GT16740@mit.edu> <871uqvgrnm.fsf@qmul.ac.uk> <20120120171801.GA16740@mit.edu> <20120122181609.GQ16740@mit.edu> <87zkdfgr0m.fsf@qmul.ac.uk> <20120123015222.GB7600@mit.edu> <87r4yphpu7.fsf@qmul.ac.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87r4yphpu7.fsf@qmul.ac.uk> User-Agent: Mutt/1.5.21 (2010-09-15) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFupjleLIzCtJLcpLzFFi42IR4hTV1q1kk/M3ePFa02L1XB6L6zdnMjsw eeycdZfd49mqW8wBTFFcNimpOZllqUX6dglcGedfzGIq+OxXMfP8JuYGxoO2XYycHBICJhJf X59jhrDFJC7cW8/WxcjFISSwj1Fi6cSLLCAJIYENjBJ9f5whEieZJNqn3mSDSCxhlPgwQQvE ZhFQldh9ZCcriM0moCGxbf9yRhBbREBH4vahBewgNrOAtMS3381MILawgL3E5KWLwTbzAtWc uzGZBWLBGSaJLxd2MUIkBCVOznzCAtGsJXHj30ugZg6wQcv/cYCEOYF2bbzyDGyOqICKxJST 29gmMArNQtI9C0n3LITuBYzMqxhlU3KrdHMTM3OKU5N1i5MT8/JSi3TN9HIzS/RSU0o3MYKC mt1FeQfjn4NKhxgFOBiVeHglZsr6C7EmlhVX5h5ilORgUhLllWeV8xfiS8pPqcxILM6ILyrN SS0+xCjBwawkwqt2DqicNyWxsiq1KB8mJc3BoiTOq6H1zk9IID2xJDU7NbUgtQgmK8PBoSTB ewNkqGBRanpqRVpmTglCmomDE2Q4D9DwkyA1vMUFibnFmekQ+VOMuhxffredZxRiycvPS5US 570HUiQAUpRRmgc3B5aMXjGKA70lzPsMpIoHmMjgJr0CWsIEtIQjTwpkSUkiQkqqgTGu/k9j VC3Pmz27+V8L/jBTf7vsDnPs79tL511erbPuYFsR/4ZPM84eFDz0eLqv7NbXb1Nt71eKb3dn lhHLX7PzhFzLnWXdohcuHeeyUhboWJv15RLTMZb7eTNvGN8X35psGfzM3IDVUnBSVW5/rpi/ zcLoUx8yJz+JY+MRX1hXV33WL3JvfKYSS3FGoqEWc1FxIgCqA9kPIQMAAA== Cc: notmuch@notmuchmail.org X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Jan 2012 01:16:43 -0000 Quoth Mark Walters on Jan 24 at 1:05 am: > > On Sun, 22 Jan 2012 20:52:22 -0500, Austin Clements wrote: > > Quoth Mark Walters on Jan 23 at 1:13 am: > > > On Sun, 22 Jan 2012 13:16:09 -0500, Austin Clements wrote: > > > > Quoth myself on Jan 20 at 12:18 pm: > > > > > Quoth Mark Walters on Jan 20 at 12:10 am: > > > > > > > > > > > > Ok Having said this is trivial I have found a problem. What should > > > > > > notmuch do if you do something like > > > > > > > > > > > > notmuch show id: > > > > > > and that message is marked with a deleted tag? To be consistent with the > > > > > > other cases (where a deleted message is in a matched thread) we might > > > > > > want to return the message with the not-matched flag set (eg in > > > > > > JSON). But my patch doesn't, as it never even sees the thread since it > > > > > > doesn't match. > > > > > > > > > > > > Looking at notmuch-show.c I think we should not apply the exclude tags > > > > > > to do_show_single, but usually should apply it to do_show. One solution > > > > > > which is simple and is at least close to right would be to get do_show > > > > > > to return the number of threads found. If this is zero then retry the > > > > > > query without the excludes (possible setting the match_flag to zero on > > > > > > each message since we know it does not match) > > > > > > > > > > > > This is not a completely correct solution as if you ask notmuch-show to > > > > > > show more than one thread it might threads which only contain deleted > > > > > > messages. > > > > > > > > > > > > I can't see other good possibilities without slowing down the normal > > > > > > path a lot (eg find all threads that match the original query and then > > > > > > apply the argument above). > > > > > > > > > > > > Any thoughts? > > > > > > > > > > Oh dear. > > > > > > > > > > Well, here's one idea. Instead of doing a single thread query in > > > > > show, do a thread query without the exclusions and then a message > > > > > query with the exclusions. Output all of the messages from the first > > > > > query, but use the results of the second query to determine which > > > > > messages are "matched". The same could be accomplished in the library > > > > > somewhat more efficiently, but it's not obvious to me what the API > > > > > would be. > > > > > > > > Here's a slightly crazier idea that's more library-invasive than the > > > > original approach, but probably better in the long run. > > > > > > > > Have notmuch_query_search_* return everything and make exclusion a > > > > message flag like NOTMUCH_MESSAGE_FLAG_MATCH. Tweak the definition of > > > > "matched" to mean "matched and not excluded" (specifically, a message > > > > would have the match flag or the excluded flag or neither, but not > > > > both). Search would skip threads with zero matched messages and I > > > > think show would Just Work. > > > > > > > > I can think of two ways to implement this. notmuch_query_search_* > > > > could perform both the original query and the query with exclusions > > > > and use the docid set from the second to compute the "excluded" > > > > message flag. Alternatively, it could examine the tags of each > > > > message directly to compute the flag. The latter is probably easier > > > > to implement, but probably slower. > > > > > > > > Thoughts? > > > > > > I have now thought about this some more and think I understand your idea > > > (and how it would work) rather better now. > > > > > > I would suggest one small change: the flags for the messages returned > > > should be "independent": so a message can match the query or not, and it > > > can be excluded or not, with all 4 combinations being possible. (The > > > consumer of notmuch_query_search_* would extract the information it > > > wanted.) > > > > I'd initially approached it this way, but went with redefining a > > "matched" messages because it had much less impact on the API. For > > example, with the redefined "match", > > notmuch_thread_get_matched_messages still does the right thing for > > search and things like the thread subject can still be based on > > "matched" messages. If we orthongonalize these flags, then we at > > least need to count matched non-excluded messages and provide an API > > to access this (while I don't have a solid argument against such an > > API it just seems weirdly specific to me). > > Ok I have an initial implementation of this which I will post as a reply > to this thread: it does make the flags orthogonal but that would be easy > to change. If we do want to keep match to mean match and not excluded > then I would argue for a third flag so that the emacs frontend could see > all 4 possibilities. Note that in your suggestion we still need to do > something in notmuch_thread_get_matched_messages to set the subject etc > in threads with no matching non-excluded messages. Cool. I was starting to hack together an implementation too, but I'll put that on hold. Since notmuch_thread_get_matched_messages just counts matched messages, it doesn't need anything special for threads with no matched messages, but you're right that something has to be done about the subjects of threads containing only excluded messages. Could you CC me on the patch? The list has been stuck all day. > > My other concern is performance. In thread queries, marking > > non-matched messages as excluded would require either an extra query > > per thread or a single query to match all excluded messages (not > > filtered by the primary query). The former is prohibitive, though the > > latter might be acceptable (that might depend on how many things > > people mark as spam or deleted). If the cost is too high, this > > suggests that we shouldn't mark non-matched messages as excluded, but > > then we're back to effectively having three levels of matching: not > > matched, matched but not excluded, and matched but excluded. > > The implementation follows the second suggestion. I'll discuss > performance below. > > > > I have thought about some implementation ideas but I think sorting is > > > going to be the deciding factor: what order should > > > notmuch_query_search_* return messages/threads? > > > > Yes. This is exactly what I've been puzzling over, too. > > > > > For notmuch_query_search_messages either it returns them all together > > > with the excluded messages marked, or returns all included ones, and > > > then all excluded one. > > > > I would prefer them intermingled. I feel like returning one and then > > the other is just exposing implementation details. Plus, it's unclear > > if the order of the two groups should depend on the sort order, be > > configurable, or what. Intermingling them seems like the obvious > > answer. > > > > > For notmuch_query_search_threads it is less clear. Currently it returns > > > threads in order of first matching message. It is not clear what > > > matching means now: is matching and included, or just matching? If the > > > former then we will be returning some threads with no matching and > > > included messages so we need to decide where to put them in the order. > > > > I would argue that, if the caller cares about the sort order of the > > results, it only makes sense for it to skip over threads consisting > > only of excluded messages, and if the caller doesn't care about the > > sort order, we can choose whatever's most convenient. > > I have gone with this intermingled suggestion. > > One other implementation detail: the code uses the exclude query > generated by your _notmuch_exclude_tags function; thus if the user > over-rides the exclude with tag:deleted then these messages will not be > marked excluded. I can't work out whether that is the right behaviour. I think that's reasonable. Otherwise we need yet another state, matched-but-excluded-but-overridden. > The code as it stands should finish where we started, but have the > infrastructure to show excluded messages. > > I think a nice interface would be for the emacs notmuch-search frontend > to take all messages and hide all threads with 0 matching messages, but > make them unhideable. (Possibly if there are no non-hidden threads it > could offer to show them.) The notmuch-show frontend could take all That does sound nice, though where would they go in the list? Just at the end? > messages but show only a header line for excluded messages (perhaps in a > different colour). It would jump to the first matching non excluded > message unless there aren't any, in which case it would go to the first > matching (necessarily excluded) message instead. That sounds reasonable. Alternatively, show could parallel search and hide these messages, but point out that there are excluded messages and offer to show them. > Performance: the code/notmuch is generally fast. On my computer/email > archive of circa 70000 the time to constuct the list of excluded > messages is of the order of 20 milliseconds. (Inherently it should not > be much slower since the thread return already has to construct a > complete list of all messages matching the query before it can return > any threads.) I think this time will depend heavily on how many excluded messages you have. My performance concern is that it's not bounded by how many messages match the original query. > However, in some case it will appear slower: if there are lots of > threads that only match in excluded messages then this code will waste > time constructing these "hidden threads". In other words the code will > be comparable to the pre-exclude notmuch: since it is constructing the > same threads. > > Finally, the code is obviously an early draft to see what people > think. I know there are things that need freeing, and probably some > error returns that should be checked. > > Best wishes > > Mark