From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by arlo.cworth.org (Postfix) with ESMTP id 609E16DE0319 for ; Mon, 4 Apr 2016 11:24:02 -0700 (PDT) X-Virus-Scanned: Debian amavisd-new at cworth.org X-Spam-Flag: NO X-Spam-Score: -0.032 X-Spam-Level: X-Spam-Status: No, score=-0.032 tagged_above=-999 required=5 tests=[AWL=-0.032] autolearn=disabled Received: from arlo.cworth.org ([127.0.0.1]) by localhost (arlo.cworth.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id u35Dz1b8793D for ; Mon, 4 Apr 2016 11:23:54 -0700 (PDT) Received: from che.mayfirst.org (che.mayfirst.org [209.234.253.108]) by arlo.cworth.org (Postfix) with ESMTP id D98ED6DE0243 for ; Mon, 4 Apr 2016 11:23:53 -0700 (PDT) Received: from fifthhorseman.net (dhcp-a320.meeting.ietf.org [31.133.163.32]) by che.mayfirst.org (Postfix) with ESMTPSA id 3A937F991 for ; Mon, 4 Apr 2016 14:23:50 -0400 (EDT) Received: by fifthhorseman.net (Postfix, from userid 1000) id AA0FB2018C; Mon, 4 Apr 2016 15:23:43 -0300 (BRT) From: Daniel Kahn Gillmor To: Notmuch Mail Subject: Re: thread merge/split proposal In-Reply-To: <87mvp9uwi4.fsf@alice.fifthhorseman.net> References: <87mvp9uwi4.fsf@alice.fifthhorseman.net> User-Agent: Notmuch/0.21+124~gbf604e9 (http://notmuchmail.org) Emacs/24.5.1 (x86_64-pc-linux-gnu) Date: Mon, 04 Apr 2016 15:23:43 -0300 Message-ID: <87k2kdutao.fsf@alice.fifthhorseman.net> MIME-Version: 1.0 Content-Type: text/plain X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Apr 2016 18:24:02 -0000 On Mon 2016-04-04 14:14:27 -0300, Daniel Kahn Gillmor wrote: > b) when an unjoin is requested, do a graph analysis of every message in > the thread's In-Reply-To and References headers, and recreate > distinct threads from the connected components. [...] > From the CLI, it would look something like: > > notmuch join-threads THREAD_A THREAD_B [ THREAD_C ... ] > notmuch split-thread THREAD_X On IRC, bremner pointed out two specific improvements to this proposal: 0) the inverse operation of "join" proposed above is distinct from the ongoing discussion about splitting threads in arbitrary places. I don't want to conflate these issues, so my proposed connected-component-analysis operation should be "notmuch unjoin-thread", and not "notmuch split-thread" 1) a "join" operation probably has to be stored explicitly in the database, so that the threads will be re-joined across a dump/restore operation. I'm happy with both of these improvements. for (1) i'd propose that the join operation would be implemented by adding a new term type "join", which can be applied to any document. Its value is the message-id of a message that *should* be "in-reply-to" but wasn't. So for example: messages A and B are in one thread; messages C and D come in in a separate thread that should have been joined to the prior thread but is not. i propose implementing this as something like: notmuch_message_add_term(message_c, "join", get_message_id(message_a)); notmuch_message_set_thread_id(message_c, get_thread_id(message_a)); notmuch_message_set_thread_id(message_d, get_thread_id(message_a)); i'd also add all the "join" terms to "notmuch dump", though i'm not sure exactly how to extend the "notmuch dump" format. feedback welcome, --dkg