From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 6B5781F8C6; Mon, 28 Jun 2021 22:12:36 +0000 (UTC) Date: Mon, 28 Jun 2021 22:12:36 +0000 From: Eric Wong To: Konstantin Ryabitsev Cc: meta@public-inbox.org Subject: Re: Recording archiver origins in git Message-ID: <20210628221236.GA25399@dcvr> References: <20210628212610.zbexle2b2uvux2sd@nitro.local> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20210628212610.zbexle2b2uvux2sd@nitro.local> List-Id: Konstantin Ryabitsev wrote: > Hello: > > I'm working away on grokmirror+public-inbox replication, and I'm trying to > come up with a good solution for passing the "archiver origins" info. In > examples/grok-pull.post_update_hook.sh, we try to get this information out of > a curl call to the clone origin, but this may not be reliable for a number of > reasons: > > 1. we may be cloning from an intermediary location that only serves the git > repositories and the manifest file (e.g. erol.kernel.org) > 2. the call may retrieve information relevant to the intermediary, and not to > the origins of the archive > > I'm thinking of including a special location in the git repo itself for > passing some of the same info currently found in the config snippet, e.g. in > refs/meta/origins. > Imaginary code snippet: > > $ git show refs/meta/origins:i > [metadata] > source = smtp Is "source" necessary? It seems like something that could be in the "description" file or noted in the contents of publicinbox.$NAME.infourl. > listaddress = linux-kernel@vger.kernel.org > listid = linux-kernel.vger.kernel.org > archive-url = https://lore.kernel.org/linux-kernel > archive-contact = postmaster@kernel.org I think the keys should match what we use in the config file, at least. So s/listaddress/address/ and s/archive-url/url/ I'm not sure if "contact" is necessary if the aforementioned "infourl" exists. > This way, even if the archive gets mirrored around a bunch of times, it's > still possible to track where it originated, and if the original archive info > becomes obsolete, someone can update the information without it affecting the > rest of the archive. > > Does that sound sane? I think so. Only the latest epoch would be taken into account, I suppose.