From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2.migadu.com ([2001:41d0:403:4876::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms8.migadu.com with LMTPS id cJXkFfH39WUpbwAAe85BDQ:P1 (envelope-from ) for ; Sat, 16 Mar 2024 20:50:09 +0100 Received: from aspmx1.migadu.com ([2001:41d0:403:4876::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2.migadu.com with LMTPS id cJXkFfH39WUpbwAAe85BDQ (envelope-from ) for ; Sat, 16 Mar 2024 20:50:09 +0100 X-Envelope-To: larch@yhetil.org Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=wolfsden.cz header.s=mail header.b=waAEIJNi; dkim=pass header.d=wolfsden.cz header.s=mail header.b=Vk9wcAFY; dmarc=pass (policy=none) header.from=wolfsden.cz; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" ARC-Seal: i=1; s=key1; d=yhetil.org; t=1710618609; a=rsa-sha256; cv=none; b=o/CDaQU3QK2eQeIooxXIA0ET/PXwz5v8McAoP0NoIRrzEJhK38UsMYMGRu3SFSodtmAsKh Xy7TZkDgxlGIdN6tAdQTqu2zz461kyeVH7cmcQIZYCRG4VH3fLewQnfv69Y0IjGV4fITCw GjMRJMMJbnjOXaFRiHXeazOdRehhKxih/YC0eZOjCgeQvzxm2a8GlerX6x9wq5UAAFULYb KLRYBHdgkYXs1YEB66buTuQdZDdvABJhtfdxbqCnXXcu9otjtzgkLKFfh/qSdVTmZ+TUcg DQVYDW+V3rTdDiiK+uOWb1Y0jjZAEP39NjIWPq7UhKepPHPs9r8kRSUkIigNsg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=wolfsden.cz header.s=mail header.b=waAEIJNi; dkim=pass header.d=wolfsden.cz header.s=mail header.b=Vk9wcAFY; dmarc=pass (policy=none) header.from=wolfsden.cz; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1710618609; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=ROreFIUo0k3h9pOeiHvz86x8W+Xm7sm94F5iOUe19F0=; b=AMMYoLNz5U/FpMzxQesm20G5mOJIfSsg5i2CnwSdTXMLwLsovmyP+tj/XSkLdMkBp9Yhl+ Wz0g/AMfq4D8r4WucB7IHnePplt3d0/OjQ8H0WJUVKQlLEQsYL9IOFExqwUXRNcj8Y6xSW jQeFF9D9nJMFuVR/hWTm4aSWzV/1H9YoHe7P5JjmNQAMRO/vizFFmXY5DHLKV4FnEfErnb LqzkVrVkR3mSUOv9aTY8oyTZYP6tWcppcZgqA1EuoXdke5CbyAGkO3qxdC6UEgbiaG181V PBn1DvsWTNYaiBIQNJV0eBDi3FIJiFfZh+VeKV/yAADFRU3KeojREZ27wpXm2Q== Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 178CD709AE for ; Sat, 16 Mar 2024 20:50:09 +0100 (CET) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rla2W-0007sB-Ns; Sat, 16 Mar 2024 15:49:56 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <~@wolfsden.cz>) id 1rla2U-0007rw-BH for guix-devel@gnu.org; Sat, 16 Mar 2024 15:49:54 -0400 Received: from wolfsden.cz ([37.205.8.62]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <~@wolfsden.cz>) id 1rla2O-0003Ix-FG for guix-devel@gnu.org; Sat, 16 Mar 2024 15:49:54 -0400 Received: by wolfsden.cz (Postfix, from userid 104) id 483EA28F2EB; Sat, 16 Mar 2024 19:49:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=wolfsden.cz; s=mail; t=1710618585; bh=39HFRzUmCGmfDtvNa96896cQpTefOKoXY+izXf6Bv8M=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=waAEIJNiRD2RjzTHkdtftDpQnMtGxZSrgGqpxuW1ZKTU5XHbhFrdGJqNVMk/SnoiL hfSL1grTO+D7VZhIGlyheYxt/QjAn9Hm2/ri8+KXNAP5Gos8ZR+WiOQTLT5hzz5hIb 4FtC9F1gxyDHZwdqL7U0JTy+YnHkO2nXIUo6Kn7Uw1UerZf7CTlfYcx+M0V4WfDO8W X1Y+WnxqS5lcjW4Dpj7QyDvZvRKnnHTxCKDggjPNd7DoKPDzRf1gGYQ+mU5D0FxsEj MgD8182+lwybM03I3aMXog0QsN95V7qfC+RPKKofmcD2PqvBPAlKLY9b5zRNAF+1ML E3P430k8Umitj76wLACl5e3CWlaFZKb24a4Y0bFX3bEw0phDmYbHfASleh7l1Mx4HN XVhUzWcMZoJsLWqbM1jnNfOKgTVmie+UGbd0QmOg6Wt7MSVY3hdDE1sluRrxLOVr45 7GeUEeHi8d/1r8n3quG+VSIag3R7X890JWS7XrhsTJD10+qWXD4b9raRH1JP0JEmxG ofX01vWdMRzUj1+dRWeEI962K00CV1HeU4zJDEHQDNeZwt2FUAigMvwXJzc9H6ZaIH a8tXy6QbHU0vOI4wULvcdSbA9eik5i546jRQl8NmpKyM0aTzntuAiXxn1WnlJJ4cDS ZVvvK24aRnUzr373dFgFn5B0= Received: from localhost (unknown [146.70.134.188]) by wolfsden.cz (Postfix) with ESMTPSA id BFBA828FDB5; Sat, 16 Mar 2024 19:49:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=wolfsden.cz; s=mail; t=1710618584; bh=39HFRzUmCGmfDtvNa96896cQpTefOKoXY+izXf6Bv8M=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=Vk9wcAFY73bhhN89xubafLFiObiCV3tOjOyhvPkfOt8ZdQ0sHbEG00eEdJfH2GFBt l4rUubm213fiP4eSiIX8GCHEvfNZgH+MygEMpzqggreUAK6Bb6iE/yAMOduSVfHMGd lecvLEyjvzY/PwFINiDiM7IISC0l8rv/dv9NewkKfEAed84EGi9g2/cS3BcCfBLugP PUbnVN4kO2c3t6khtPRvy8bb54STwGXtVhpHgY1PBs2qNr7i6A4+RfpDaVVqYRo35A C+zyNPBHoe8EiNx3lHdIBwXQEp4dOCquhFWwJT+FT0m3tjQcr1ygyqjqYtxpF2oIDa OAEN+nYGro7dqYZIq0U8cFuUjFO+rHCEySjfevMItQ3XegV2MvnJ0TmN4OwN1G8YdE /aKbuw4FTl3jIlTxDYE8T0id9yw0zVVPLJffyXQqsjcpJC+3zSS6bCXvmMw9oU6t1X 1SLtmuDxnMfbveWbKOksZwL7LYKkUwsy29mD3p5m0ahmWx1YxKZCITwiSjq3DTPBjW nN+2v1f24CvFXogDh0fWyVq+0cD4WNhFCCIKDhNR1eTi5QUOVCUvxN6Mm7EOtKGX2/ di4oHIbzQ41bPX5SJIXSyXE+sVYCJzFeryYRoV0QT6i7eSHSfr3g8jzSNOHHXVPj6C BJPFYp0d8RXiWiyvkE2FZP3c= Date: Sat, 16 Mar 2024 20:49:44 +0100 From: Tomas Volf <~@wolfsden.cz> To: Ian Eure Cc: Christopher Baines , guix-devel@gnu.org Subject: Re: Concerns/questions around Software Heritage Archive Message-ID: Mail-Followup-To: Ian Eure , Christopher Baines , guix-devel@gnu.org References: <87il1mupco.fsf@meson> <87cyruqcfe.fsf@cbaines.net> <87edcaug07.fsf@meson> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="sD+WLf66kyf+tyHc" Content-Disposition: inline In-Reply-To: <87edcaug07.fsf@meson> Received-SPF: pass client-ip=37.205.8.62; envelope-from=~@wolfsden.cz; helo=wolfsden.cz X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, T_SPF_TEMPERROR=0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: guix-devel-bounces+larch=yhetil.org@gnu.org X-Migadu-Flow: FLOW_IN X-Migadu-Country: US X-Migadu-Scanner: mx10.migadu.com X-Migadu-Spam-Score: -11.49 X-Spam-Score: -11.49 X-Migadu-Queue-Id: 178CD709AE X-TUID: g7ZCTrW2hatZ --sD+WLf66kyf+tyHc Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2024-03-16 12:06:27 -0700, Ian Eure wrote: > > Christopher Baines writes: > > > [[PGP Signed Part:Undecided]] > > > > Ian Eure writes: > > > > > Hi Guixy people, > > > > > > I=E2=80=99d never heard of SWH before I started hacking on Guix last = fall, > > > and > > > it struck me as rather a good idea. However, I=E2=80=99ve seen some = things > > > lately which have soured me on them. > > > > > > They appear to be using the archive to build LLMs: > > > https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starc= oder2/ > > > > > > I was also distressed to see how poorly they treated a developer who > > > wished to update their name: > > > https://cohost.org/arborelia/post/4968198-the-software-heritag > > > https://cohost.org/arborelia/post/5052044-the-software-heritag > > > > > > GPL=E2=80=99d software I=E2=80=99ve created has been packaged for Gui= x, which I > > > assume > > > means it=E2=80=99s been included in SWH. While I=E2=80=99m dealing w= ith their (IMO: > > > unethical) opt-out process, I likely also need to stop new copies > > > from > > > being uploaded again in the future. > > > > > > Is there a way to indicate, in a Guix package, that it should > > > *never* > > > be included in SWH? > > > > Not currently, and I don't really see the point in such a mechanism. If > > you really never want them to store your code, then you need to license > > it accordingly (and not make it free software). > > > > I don=E2=80=99t want my code in SWH *because* it=E2=80=99s free. A prima= ry use of LLMs is > laundering freely licensed software into proprietary, commercial projects > through "AI" code completion and generation. Any Free software in an LLM > training set can and will be used in violation of its license, without a > clear path for the author to seek recourse. I deleted my code off Github > and abandoned it completely for this exact reason, and am deeply irked to= be > going through this nonsense again. > > A more salient question may be: Is there a process within Guix (either the > program or the organization) which uploads source to SWH? Or does it rely > on SWH indepently? `guix lint PKG-NAME' schedules SWH archival if possible. No code is direct= ly uploaded (at least currently), so assuming you have a IP list of SWH, it sh= ould be possible to block it. At least AFAIK. If you have the list, or know how to get it, could you share it? I would be interesting in blocking it as well from my git hosting. > > If the latter, my problem is likely solved by blocking SWH at my network > edge and opting out of their archive (or trying to) and the downstream > training models they=E2=80=99ve already put it in. If the former, the on= ly control > I currently have to protect my license is removing packages from Guix whi= ch > contain it. I don=E2=80=99t want that outcome. > > Noting also that the path here seems to be SWH->huggingface->bigcode > training set, and the opt-out process for the training set appears to be a > complete sham. To opt-out, you must create a Github Issue; only one opt-= out > has *ever* been processed, and there are 200+ sitting there, many with no > response for nearly a year[1]. I want no part of any of this. > > > > > Is there a way to tell Guix to never download source from SWH? > > > > Also no, and it's probably best to do this at the network level on your > > systems/network if you want this to be the case. > > > > I=E2=80=99ll investigate this, though I=E2=80=99d prefer if there was a w= ay to configure > source mirrors in the Guix daemon. > > > > Skipping back to this though: > > > > > I was also distressed to see how poorly they treated a developer who > > > wished to update their name: > > > https://cohost.org/arborelia/post/4968198-the-software-heritag > > > https://cohost.org/arborelia/post/5052044-the-software-heritag > > > > This is probably worth thinking about as Guix is in a similar situation > > regarding publishing source code, and people potentially wanting to > > change historical source code both in things Guix packages and Guix > > itself. > > > > Like Software Heritage, there's cryptographical implications for > > rewriting the Git history and modifying source tarballs or nars that > > contain source code. > > > > We have 17TiB of compressed source code and built software stored for > > bordeaux.guix.gnu.org now and we should probably work out how to handle > > people asking for things to be removed or changed (for any and all > > reasons). > > > > It's probably worth working out our position on this in advance of > > someone asking. > > > > Yes, I agree that Guix needs a better solution for this. > > Thanks, > > =E2=80=94 Ian > > [1]: https://github.com/bigcode-project/opt-out-v2/issues > T. -- There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors. --sD+WLf66kyf+tyHc Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEt4NJs4wUfTYpiGikL7/ufbZ/wakFAmX199gACgkQL7/ufbZ/ walRVBAAo9pnnOXliNJlxrv89k/AXQABRLLr+axMiLZkKsJax4LX1Zey3W9ej3a6 uyFw+k8tlsSo1jfITw7lrr/KweBAKf9Zao+OoCFAkVn5mwd6k0XLGr8axEzZK8rE au+BV7CGgaG1lBSn1cmLZzPHIZLk04Z1jXp7SYCwf396kAnQb67M09AbU6ScRuf3 QkPaVz1t+av3OVU1tYZyNc6QLZ+yu50CXyUJgaXsr15hwr+BAq3vk7ftQNAu2rTU VHIbyWbAduALIlbGZiiKRFUThLajA7XcyaslBLA6SWdn+i/6H9rMGF5NweRLlLeQ H6j5EaaSQQKal9LzkD6qpk/2+oZ6Eb3TCdk9taAGo8vnRdHJc0yi9BfGQwZ6hSOe lyd5GKsamkSZ4Ln4TncQdXbxwtHqPy7/QRT1c1X9kAFrC396ZSssmkrzHiTMCViy ImYiUJQYIqLckK40DhnO42t+3hOk8jW7fSZnZlvcKl6SHwiHAD7AVjXAY0SFpFIi Is8euUXf4JGZkxtGNtP772SH8MFbi13N/O/pfFb+YYhB5H6NXQz1PyNbde9RJBQG PV7xYqVpCYnjoq9OjBIGdVXc2AJAEHjBk2pTC/58zrjFI/C0nmB3GVxE5WrvM88W IV3EfolxuCKG3FtE2THP7TNPEw1LtVPgcvyRIt8HHuY2Twqsqw8= =udtX -----END PGP SIGNATURE----- --sD+WLf66kyf+tyHc--