From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp11.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id UNeWHsptz2HPXwAAgWs5BA (envelope-from ) for ; Fri, 31 Dec 2021 21:53:30 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp11.migadu.com with LMTPS id QHLcG8ptz2FCWAEA9RJhRA (envelope-from ) for ; Fri, 31 Dec 2021 21:53:30 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id E003DF674 for ; Fri, 31 Dec 2021 21:53:29 +0100 (CET) Received: from localhost ([::1]:60684 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1n3Ou1-0000tN-4t for larch@yhetil.org; Fri, 31 Dec 2021 15:53:29 -0500 Received: from eggs.gnu.org ([209.51.188.92]:32824) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n3Otg-0000tF-WE for guix-devel@gnu.org; Fri, 31 Dec 2021 15:53:09 -0500 Received: from [2a00:1450:4864:20::444] (port=39708 helo=mail-wr1-x444.google.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1n3Otf-0006S8-02 for guix-devel@gnu.org; Fri, 31 Dec 2021 15:53:08 -0500 Received: by mail-wr1-x444.google.com with SMTP id s1so57629810wra.6 for ; Fri, 31 Dec 2021 12:53:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:subject:from:to:cc:date:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=RSLCmZ7EPz+1pFrgE8g9PBK1QZyyQOoMShVG3kjg8O8=; b=kZ9dB26U3Mn4Y6e+RcdkCJkvGQZ66SlfAf9W/ymYpp2eRNLhAYOQgAM4WUIUOdeHBL 7iCps1NB4Inej9kDNKHRhDBe5YQCfMSE+jxuum/6SDNaVXdFkU/oIACreAUApqShxFJF wjsel+1rsKUhIE5NGFGo/tHyF128DdDIa1mUpDTU4aDgj5wPd4wyk/maVXkmwGh9TDpQ g8fM2HNSa02WA/Lr7HwRbNXLERgv5eouULp0jSuTwpyvPfiAN6cuzmmO3SwtqMt3HxuI u6gjpActhxezBAIuIpv9b5dXcH3K9oM7Tc3+EOq4i5pLu1GRrIXPMdDVdXrcdzteLOGV 4smw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=RSLCmZ7EPz+1pFrgE8g9PBK1QZyyQOoMShVG3kjg8O8=; b=MisSq32DodoE+6HDpc/zES6vgb8F8RuUV3u7N0Joz+aIwtrFsOgqBiaPt1SyQLxoKW 24r+uq4IOnSkl1jX6RhE3IyvM579gI1N9fTEdUIn/PvwFYzk7fadiol8MeyeAp1cs/Gj EVVNS5XHfXJojbVePjRAt97ZrDT8+XdE5jENsgP3lHEiI6/R2g2Yzc8O7CP/bHqqlI/G wD3vKN5YE0JRup7ncZ7VAIxK0Qx5MWaod5m6J33YNOgic+5WPJoN77UOeqV+k98+4ZeX WcmKGSpKhrnRgydFqz9/TTwrGFf8gmnnBZgcfd4KJQn7Ln+B2z0HuKQPwsE4PUtyLCHn GHeA== X-Gm-Message-State: AOAM533br8AGA+Qoikuzpj8L+7A111TxkChr4FHZK6MZxzJ7+snyDU7V 65w6tQ9gnqW2FwFYtw+ZjqA= X-Google-Smtp-Source: ABdhPJwOEX5n92lL+b8rNYIOtFrrDwP1sLTfPayXvcTQ1lfMp8jF5Ln6QpjcTDsdNICDyiZysnr02w== X-Received: by 2002:adf:e6c9:: with SMTP id y9mr30649053wrm.697.1640983985425; Fri, 31 Dec 2021 12:53:05 -0800 (PST) Received: from nijino.fritz.box (85-127-52-93.dsl.dynamic.surfer.at. [85.127.52.93]) by smtp.gmail.com with ESMTPSA id j3sm27875017wro.22.2021.12.31.12.53.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 31 Dec 2021 12:53:04 -0800 (PST) Message-ID: <6db819abb507b5b83d5d38e2708b29f77c46d8c0.camel@gmail.com> Subject: Re: On raw strings in commit field From: Liliana Marie Prikler To: zimoun , Ricardo Wurmus Date: Fri, 31 Dec 2021 21:52:46 +0100 In-Reply-To: <86bl0w667k.fsf@gmail.com> References: <6e451a878b749d4afb6eede9b476e5faabb0d609.camel@gmail.com> <86y243kdoo.fsf@gmail.com> <899587fb6a76ddfa37d197d3d0fd23cdc7ad8592.camel@gmail.com> <867dbmi7pf.fsf@gmail.com> <3d448fe42f0c43574db96fa26aecd7da5fd5a95d.camel@gmail.com> <86k0flpnx5.fsf@gmail.com> <9a5e3e7f44155146d731dd5a97450a9ff9dff5ab.camel@gmail.com> <87bl0xglul.fsf@elephly.net> <86ilv46hls.fsf@gmail.com> <86bl0w667k.fsf@gmail.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.42.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Host-Lookup-Failed: Reverse DNS lookup failed for 2a00:1450:4864:20::444 (failed) Received-SPF: pass client-ip=2a00:1450:4864:20::444; envelope-from=liliana.prikler@gmail.com; helo=mail-wr1-x444.google.com X-Spam_score_int: 6 X-Spam_score: 0.6 X-Spam_bar: / X-Spam_report: (0.6 / 5.0 requ) DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: guix-devel@gnu.org Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1640984010; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=RSLCmZ7EPz+1pFrgE8g9PBK1QZyyQOoMShVG3kjg8O8=; b=RtTEngrsxTPHEnKMsiBNNebalipFsYv4MWUWfGXsfN4eiEl+tVz9XQF1dGBvyNmxWTG9uA QsEyvWmfPOR1GhrTXX9fUQcpMvWPPGzVVx2gCqK4n0SO6ahqY8JrVvyJpdNS16t0e180l7 RPSQl5tMt/8M0wDBUxDVJl8kIM/Ht2tpSZhVogFBggttYx/FWHt21kscEF8MRMMFDJLUrw XiJDvhp0IswFvnBOJq4dC/fXjLZUfuDOOIVDPNR5qNf0eZwGXWvcNPJr1U30sMx3ECWAbb YfB81B9f99FUlgp/gxjQIPXihZY1T1e6HO+XBKeN7qVGNDxbWmIe+0uiUIYpAw== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1640984010; a=rsa-sha256; cv=none; b=FUD7wYIZjbjK2A8rDFEfq2Acp0MpGRl6QADObRVxvBZCG8PEdIybcXJOOTEmlK27BMimlN OOvfg681XBeYK0aN4SIDwRUEh7DeiL8dRgvXl0NSnLXzCKhOkKTib3Xo9NQvU4vmtfwQyc pU5sdHXXJF39fnEM5fvJPsyqxvkQ0JZms/uemHEqT6UJCRuwNMpOcgv3F7b1DjvepqDf4X KVxIvpolDqasl+guOId9bY6405fZl6t9wHjQUAwDJG9Zu1qRREC3JxYpKNJH4mQ5N5xHVE uJx+r1HnJjHsQTZKzqdzRzbspRvmSm0GqdC45rNJC6SlEv4rUlMB3RqI+ZnBBA== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=kZ9dB26U; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -10.08 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=kZ9dB26U; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: E003DF674 X-Spam-Score: -10.08 X-Migadu-Scanner: scn0.migadu.com X-TUID: 7zlNXfTcPST3 Hi, Am Freitag, dem 31.12.2021 um 18:21 +0100 schrieb zimoun: > Redundancy adds one kind of robustness: resilience.  [...]  However > this assumes all the redundant nodes of the web of nets will be still > up, at least enough to have this…  robustness.  Me too, I hope Guix > will be popular and all redundancies still running when I will be old > or dead.  But I will not bet on that assumption. I think we can live with one or two redundant nodes dying over time; the great thing about redundancy is that things will still work out fine if a sufficient number of them (typically one) is still around at the time of query. So it'd be robust enough to actually work let's say 10 years, but I have no illusions that our time machine will ever be able to go back a lifetime (which is also a reason why I don't think using commit hashes everywhere will magically result in the robustness that appears to be desired here). > What Timothy is doing with Preservation of Guix and a window of > ~2years shows that any web of nets is really fragile.  I do not see > why the one we are building around Guix will be different. > > Instead of trying to have robustness by adding more and more, from my > point of view, it appears to me the occasion to rethink and try to > have robustness with less. > > I agree with you that various fallbacks is one good direction to go. > SWH is one thing because it is currently well supported (by UNESCO > for instance).  But many others are also worth.  Maybe IPFS or GNUnet > are worth. Why not both? Or all three, because for what it's worth SWH will also be around for some while. We would still need federated Disarchive instances to match origins to SWH IDs, IPFS files and whatever GNUnet has. > > > It is a difficult topic to know what information the ’uri’ field > > > should contain for robust long-term; a topic with a lot of > > > unknowns, although many solutions are around, they are a strong > > > change of habits and changing my own habits is already hard, so a > > > collective change is a big collective challenge. :-) > > We're going back to Cantor's argument for raw commits.  I'm not > > opposed to using commits as value of the commit field (let-bound > > commits reflected in the version, that is), but let's not forget > > that this robustness argument still presupposes that the (commit > > tag) binding is the point of failure.  This probably holds to some > > degree for "npm-something", but we also have a fair amount of e.g. > > GNOME-related packages which we trust to have robust tags and the > > only reason we don't use mirror://gnome to refer to them is because > > it's not in GNOME mirrors (yet). > > Because this point of failure for tag potentially exists, the > counter-measure would be to add more (check integrity, fallback to > other servers, etc.) and even it could be impossible if the tag > changed and propagated to all. > > I am not saying neither that we have to replace tomorrow all the tags > by commit hashes.  My point is just that this tag in the ’uri’ field > does not appears to me a correct design.  For sure, I agree it is > convenient but I think it is not The Right Thing.  Sadly, I do not > know what The Right Thing is – and commit hash is probably not The > Right Thing but it seems to me a direction to explore. I don't think there's a single Right Thing to be had here. > > > For instance, SWH promotes swhid instead of DOI for referencing > > > the publications.  I am not sure it is really popular outside a > > > small French subgroup. ;-) > > > > Completely off-topic, but isn't part of the point of DOIs that you > > can fetch the revised paper as well?  I can understand putting > > OpenData behind an SWH ID rather than a DOI, but the paper itself?  > > Why? > > If you find it off-topic, fine.  My point is to say that DOI > (extrinsic) is not known to not be The Right Thing for referencing > and intrinsic identifier is really better but it seems hard to > convince people to switch. > > For instance, DOI is known to be fragile because it relies on an > external centralized mutable index to have the bijection between the > identifier and the content.  If today I cite doi:123abc then tomorrow > when you reach this very same identifier doi:123abc, then you have no > guarantee that it is the same content.  Obviously, it is not an issue > by itself, but in scientific context where fraud is something, once > the centralized mutable index is corrupted, done! I'm not sure to which extent there's a central index on all DOIs. As far as I can see most things are actually handled by DOI registration agencies, which of course one could possibly corrupt in much the same manner. But you don't just cite a DOI, typically. You also have all that analog stuff like author, title, publisher, etc. Assuming the publisher (or an archive of their publications) still exist, you can use that to cross-check. > Because SWH-ID only depends on the content itself, it allows > decentralization and integrity check. > > Do not take me wrong, I am not comparing Git SHA-1 hash with an > integrity check. :-)  Well, maybe the interested reader can give a > look at: > > < > https://www.softwareheritage.org/2020/07/09/intrinsic-vs-extrinsic-identifiers/ > > > > All in all, I was trying to point that this extrinsic vs intrinsic > thing is bigger than ’git-fetch’ and commit hash vs tag and the root > appears to me in exploring what the ’uri’ field should contain.  This > DOI was an example to show the topic is not easy. Point taken, "it's not easy" is something we can all easily agree on :) But the larger issue with DOIs vs SWH IDs is that I typically don't need to refer to other papers by exact content, which those intrinsic tagging mechanisms rely on. If I quote a book from 2015 and you read the 2025 edition, chances are that the main body is still the same, with perhaps one or two typos fixed and a new foreword. For future academics, it might also be interesting to know whether what I claimed back in 2022 still holds then or if it has since been superseded. For historians, it might instead be valuable to archive periodically check whether the content behind the DOI changes and if so archive a new snapshot (similar to what archive.org, SWH et al. do). Then, if the DOI gets lost or some evil company or government tries to bring out a censored version of my paper or the paper I'm citing, you can browse the archive to check what's behind all those sections that have been painted black. Note that the archive must be able to be queried in much the same manner as you'd type a query in a normal search machine. If it only relied on content tagging, the evil agency could just simply hand you a broken ID or even one that refers to a maliciously crafted page of theirs. Assuming they let you track down my paper in the first place. TL;DR (even though you should read the full thing anyway): Despite what archives specializing themselves on intrinsic identifiers might tell you, they are not a panacea. I could go even further off-topic and show that NaCl is a social construct, but I'd rather stop here. Cheers