From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from localhost (localhost [127.0.0.1]) by olra.theworths.org (Postfix) with ESMTP id 2959D431FBC for ; Wed, 17 Feb 2010 07:03:44 -0800 (PST) X-Virus-Scanned: Debian amavisd-new at olra.theworths.org X-Spam-Flag: NO X-Spam-Score: -0.866 X-Spam-Level: X-Spam-Status: No, score=-0.866 tagged_above=-999 required=5 tests=[AWL=-0.867, BAYES_50=0.001] autolearn=ham Received: from olra.theworths.org ([127.0.0.1]) by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ERE6rLUVQOBE for ; Wed, 17 Feb 2010 07:03:43 -0800 (PST) Received: from mail-pz0-f193.google.com (mail-pz0-f193.google.com [209.85.222.193]) by olra.theworths.org (Postfix) with ESMTP id 53904431FAE for ; Wed, 17 Feb 2010 07:03:43 -0800 (PST) Received: by pzk31 with SMTP id 31so5427729pzk.32 for ; Wed, 17 Feb 2010 07:03:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:content-type:subject:from:to :in-reply-to:references:date:message-id:user-agent :content-transfer-encoding; bh=ubGg4GMtqtlMfF9Q4QNYsCwrmpzy2GgvOsyF38MmHP0=; b=VtjX1ZIi9sxWlCq0WYGNKkH1cIrgPSdQXnuHthbmSv1cODl1ODZNAONWhEeC/XMSHp q69aWheZe/fhDEaZr1yDQxEVzMBeML+ZjOqYvmrqsdZiaF4Yuj61T7syeDYRzlBdszlA nfwhVwIn1qSfPCi4JwTdYRf10JqpVIosjXncA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=content-type:subject:from:to:in-reply-to:references:date:message-id :user-agent:content-transfer-encoding; b=epSUHqhC5QxjTMU5dw2uMb6Q6ldcR5TjhFOCZ859qsNzPqMDPiPmzQm7Ek0UQWY9+G zkON9CJwRa+EpVAzzdRBfXs4J98RKpM1l9ml6OGZysB2NKhpI7Yw2e+mCb8Pfwafq9db FQiWbMmE2gteR8oxsw6IC0xRJf9iCOiH8UdIc= Received: by 10.140.55.4 with SMTP id d4mr5353505rva.117.1266419022448; Wed, 17 Feb 2010 07:03:42 -0800 (PST) Received: from localhost (umass-959-129.wireless.umass.edu [128.119.77.129]) by mx.google.com with ESMTPS id 15sm2628108pwi.0.2010.02.17.07.03.38 (version=TLSv1/SSLv3 cipher=RC4-MD5); Wed, 17 Feb 2010 07:03:40 -0800 (PST) Content-Type: text/plain; charset=UTF-8 From: Ben Gamari To: notmuch In-reply-to: <20100217012101.GD8249@lapse.rw.madduck.net> References: <20100215002914.GA22402@flamingspork.com> <20100217012101.GD8249@lapse.rw.madduck.net> Date: Wed, 17 Feb 2010 10:03:36 -0500 Message-Id: <1266418124-sup-6308@ben-laptop> User-Agent: Sup/git Content-Transfer-Encoding: 8bit Subject: Re: Mail in git X-BeenThere: notmuch@notmuchmail.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Feb 2010 15:03:44 -0000 Excerpts from martin f krafft's message of Tue Feb 16 20:21:01 -0500 2010: > What I am wondering is if (explicit) tags couldn't be represented as > tree-objects with this. > > evenless-link — link a message object with a tree object > evenless–unlink – unlink a message object from tree object > [replaces evenless-unlink] I was actually wondering this very thing. I'd just be worried about tags with large numbers of messages (presumably we would need an All tag, that would contain a reference to every known message). It seems like the simple act of adding a message to the repository could turn into an extremely expensive operation. Moreover, deleting a message could also be quite expensive as this will require rewriting all of the tags that reference it. Surely, we would need to batch these sort of operations to avoid disasterous performance. However, even with batching, it seems we would face some pretty serious scalability issues. I think if we were to implement tag storage in trees, we'd need to use a multi-level tree. This way we could avoid rewriting a tree object containing all of the tag's messages on every change. I apologize if this was already obvious to everyone but me. > > messages would then be deleted whenever using git-gc. > > No idea how this would sync if we don't keep ancestry. Otoh, it > would probably not be very expensive to do just that. I think that keeping the ancestry would be quite important and would come with relatively low overhead given the correct dereferencing of data structures. > > notmuch would then only search and provide the hash ID(s); tags > would be a function of storage. > > Is it possible to find out all trees that reference a given object > with Git in constant or sub-linear time? > I don't believe so. I think this is one of the reasons why git gc is so expensive. - Ben