From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 3F5151F404; Thu, 22 Feb 2018 21:12:34 +0000 (UTC) Date: Thu, 22 Feb 2018 21:12:34 +0000 From: Eric Wong To: meta@public-inbox.org Subject: Re: [v2] introduction of content_id Message-ID: <20180222211234.GA22833@dcvr> References: <20180209181718.GA8847@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180209181718.GA8847@dcvr> List-Id: Eric Wong wrote: > In addition to the git object_id (blob SHA-1) and Message-Id > header; it seems necessary to introduce an in-between identifier > for deduplicating which isn't as loose as Message-Id or as > strict as object_id: content_id I think this will only be calculated-on-the-fly in cases the Message-ID matches. No need to cement it into the Xapian DB, meaning we can tweak which headers we care about more freely.