From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2.migadu.com ([2001:41d0:303:e224::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms13.migadu.com with LMTPS id MIvuMnyfcWZZOQEAe85BDQ:P1 (envelope-from ) for ; Tue, 18 Jun 2024 14:53:49 +0000 Received: from aspmx1.migadu.com ([2001:41d0:303:e224::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2.migadu.com with LMTPS id MIvuMnyfcWZZOQEAe85BDQ (envelope-from ) for ; Tue, 18 Jun 2024 16:53:48 +0200 X-Envelope-To: larch@yhetil.org Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=retrospec.tv header.s=fm1 header.b=TdAI8jmf; dkim=pass header.d=messagingengine.com header.s=fm2 header.b="X J3RKEw"; dmarc=none; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" ARC-Seal: i=1; s=key1; d=yhetil.org; t=1718722428; a=rsa-sha256; cv=none; b=jEdASHQ2GNDzbMxKOabD7kt2FG1HSJxNLKPi8O5LFC140ps7oDdpLQSxZgEWU/OYL3NGIg RPvt6cfpmHUkjl7U+W5KpyJsBpEQOQMwUnXC1c3Zi8zWYPP4tJm+Ub+4v+puqsL7hglzY6 xWABgSGU5UzdA+BUoBO0y3+j27j90xYLGueIxl6TPbUNqtWMOvd3ooPgAPEtu1t8gvxVL2 7uyrvWOB+aJQ9HnbISxiboDr+HHqwhQXp3tFmo9SA/JADE9gxHE1CqIO4R0HdALEAWY+ga 70CPGkRf1YhNWHKFheIGIMc82kdO3gu46H37mqaSoOC3HjgNuIm4kXYDxsEUOg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=retrospec.tv header.s=fm1 header.b=TdAI8jmf; dkim=pass header.d=messagingengine.com header.s=fm2 header.b="X J3RKEw"; dmarc=none; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1718722428; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=MVADvcxZqCDf3Ytz3CSwbGZ7MK68KZpAe+qUVLWF7Oo=; b=mc3n5VAKcf9g8SZsoqhAAu8CucXAOdem+V+glggVlvm8RnppKC7zi6EVLE0L+ca0RYsJKq Xqi363K2c9PsMPEwavz8803H/xJitIYHFiqBCFtbUbO3W+LERCZ/55duVPlZruZ+4SVE7+ wzD9k5VZBXOOee9XLIANfA0vKmMKZqVPXKRlJLehfoEcDoWurFvJDoJ1fOqPXZGL29jGRC BqWvb6W5UB8/tnkHl144wMV0xG4FF149BJEpfUS2uHaxBBHjUHpvLdsp7xOWpdxAaEbyB/ XzD1qD7SGpfDt/hFp1On47a/NCSLKuRKo38m4JYNiz8O60rYdafyyS7EC9UKhg== Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 18E4A666F3 for ; Tue, 18 Jun 2024 16:53:47 +0200 (CEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sJaCq-0005QA-Sy; Tue, 18 Jun 2024 10:53:08 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJaCo-0005Pi-SM for guix-devel@gnu.org; Tue, 18 Jun 2024 10:53:06 -0400 Received: from fout2-smtp.messagingengine.com ([103.168.172.145]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJaCl-0003ME-IY for guix-devel@gnu.org; Tue, 18 Jun 2024 10:53:06 -0400 Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailfout.nyi.internal (Postfix) with ESMTP id 6E3DA1380119 for ; Tue, 18 Jun 2024 10:53:00 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Tue, 18 Jun 2024 10:53:00 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=retrospec.tv; h= cc:content-transfer-encoding:content-type:content-type:date:date :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm1; t=1718722380; x=1718808780; bh=MVADvcxZqCDf3Ytz3CSwbGZ7MK68KZpAe+qUVLWF7Oo=; b= TdAI8jmfpESQfboohUgeX4xy2mtZLuEvFxNapX9LTbyw1wa8cT82qvHcP+UgNkap VIt1GUkmfVymSdpD0Mpav6tq9e+sbr/107ZOY7qNRZ+/kanGaN+wENozqNHbMhXN 6skhQDWZ5Vh1LpHZ/Pzo/f3v4Raj9gqTbhkEG8byBFi32xuTS9fvzFVIG4yz3FMQ dZusDn/MmVNSwVOcn7MFiq0gvmc1zTMcmZPlc8KCr8J4W/nArSRm7OXMW9RI53As +l5978fGvfUgF8B2PxGhDc/0vTfM9ZgTfaBGIXUlHaOOVCgTQBP2U6jZq/H17XDL /6QVF7UkqAKznCDhg28fmA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm2; t=1718722380; x= 1718808780; bh=MVADvcxZqCDf3Ytz3CSwbGZ7MK68KZpAe+qUVLWF7Oo=; b=X J3RKEw6JMoTVSZldDtDPis/gBnZ6W0cuHmLT9OQJfZcvu6MVf8vCFaO2C2NfU9YC YG0br8EyRotfULYH/bs750tsXcAGnv3fGJ2vQYbdD8uyFK5p9yOkBQ0kJppv1HF0 A2ZWkn3OhN+C9eP2DTMvyvpERv4Ted2bX2wxSZdFkk355gvEMKKz06BNW83Gjzdm /H85Y7mV6Y9L0L31PGohVN/tpMU/cUV2WgSXqmodJQtggsFmr+rYuMH/YHrjS/iS DXGUI+BvyTZQMA1bcnSQEXru7kg6Ty3qg2dHS+Ju5ujQQ3PEPBM3QqbSugd4RCoU wBnKHYIZSK898m9HWmdlQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrfedvkedghedvucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpehffgfhvffuffgjkfggtgfgsehtqh ertddtreejnecuhfhrohhmpefkrghnucfguhhrvgcuoehirghnsehrvghtrhhoshhpvggt rdhtvheqnecuggftrfgrthhtvghrnhepjeelueevudekveeukefhhffhgeejhfdthfdtud elfffgteekiefhgeehjedvtdehnecuffhomhgrihhnpehhuhhgghhinhhgfhgrtggvrdgt ohdpshhofhhtfigrrhgvhhgvrhhithgrghgvrdhorhhgnecuvehluhhsthgvrhfuihiivg eptdenucfrrghrrghmpehmrghilhhfrhhomhepihgrnhesrhgvthhrohhsphgvtgdrthhv X-ME-Proxy: Feedback-ID: id9014242:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Tue, 18 Jun 2024 10:52:59 -0400 (EDT) References: <20240618113717.4a6bad2b@fannys.me> User-agent: mu4e 1.8.13; emacs 28.2 From: Ian Eure To: guix-devel@gnu.org Subject: Re: Next Steps For the Software Heritage Problem Date: Tue, 18 Jun 2024 07:19:26 -0700 In-reply-to: <20240618113717.4a6bad2b@fannys.me> Message-ID: <8734pa5mlx.fsf@meson> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=103.168.172.145; envelope-from=ian@retrospec.tv; helo=fout2-smtp.messagingengine.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: guix-devel-bounces+larch=yhetil.org@gnu.org X-Migadu-Flow: FLOW_IN X-Migadu-Country: US X-Migadu-Spam-Score: -8.98 X-Migadu-Scanner: mx12.migadu.com X-Spam-Score: -8.98 X-Migadu-Queue-Id: 18E4A666F3 X-TUID: 7pd8Rk0qRfIN Hi MSavoritias, Thank you for the email. I=E2=80=99m going to lay out this situation as clearly as I can, in the=20 hope that others will better understand, and hopefully treat it=20 with the seriousness it deserves. 1. Guix requests SWH to archive some source code. This is fine. 2. SWH archives the code. This is also fine. 3. SWH gives all their source to an AI company, HuggingFace. This=20 is questionable. While fine in theory, the company they gave it=20 to, HuggingFace, violates both the licenses of the code they=E2=80=99re=20 given, and SWH=E2=80=99s own policy on LLMs. Instead of terminating the=20 partnership, SWH has continued to tout it as "responsible AI" in=20 the face of these violations[1]. This makes me doubt whether=20 they=E2=80=99re acting in good faith. 4. HuggingFace trains a LLM out of all the code they=E2=80=99re given and=20 redistributes it. This is *not* fine. The LLM is a derivative=20 work of the source code it=E2=80=99s trained on, which violates the=20 licenses of many projects in its training set -- it=E2=80=99s akin to=20 compiling a gigantic .so file built from the SWH dataset. 5. HuggingFace uses its StarCoder2 LLM to generate source code.=20 This is *also* not fine. This output is also a derivative work of=20 the inputs, and it=E2=80=99s redistributed with no license or attribution=20 whatsoever. HuggingFace purports to include attribution in their=20 model, however, their own tools make no use of it and emit code=20 with no attribution. You can observe this behavior yourself:=20 https://huggingface.co/spaces/HuggingFaceH4/starchat2-playground I understand Guix=E2=80=99s participation is several degrees removed from=20 where the core of the problem lies. However, the partnership with=20 SWH is indirectly enabling massive violations of the licenses of=20 the software it packages. Guix should stop doing that. Thanks, =E2=80=94 Ian [1]:=20 https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/ MSavoritias writes: > Hello, > > Context: > > As you may already know there have discussions around Software=20 > Heritage > and the LLM model they are collaborating with for a bit now. The=20 > model > itself was announced at > https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code/ > > As I have started writing some packages I became interested in=20 > how I > might actually stop my code from ever reaching Software Heritage=20 > or at > the very least said LLM model. Every single package in guix is=20 > added > there automatically. > > I sent an email on Friday and I got an answer back that such=20 > consent > mechanism hasn't been implemented and I was shown the legal=20 > terms. > instead what I am supposed to do is: > > After guix has my code, my code will be automatically in=20 > Software > Heritage and the LLM model. So I am supposed to opt out=20 > seperately with > both of them to ensure that my code wont be used for future=20 > versions. > This of course means that my code will stay forever in Software > Heritage and the LLM model (or some version of it at least). > > The reasoning that was given was that code harvesting happens=20 > anyway > and we give an opt-out. I am guessing its opt-out and not opt-in > because they would have less code but this is speculation of=20 > course :) > > This is against our desire to make it a welcoming space and also > against the spirit of our CoC. Specifically because authors do=20 > not know > this happens when they submit packages to Guix. So it is all=20 > done > without consent. > > Next Steps: > > So what can we do as a Guix community from here? > Communication/Writing wise: > > 1. Add a clear disclaimer/requirment that any new package that=20 > is added > in Guix, the person has to give consent or get consent from the=20 > person > that the package is written in. This needs to be added in the=20 > docs and > in the email procedures. > 2. Make a blog post of our stance towards Software Heritage and=20 > the > code harvesting they are doing. This post will write in=20 > environmental > and ethical grounds why Guix is against this and mention=20 > specifically > Software Heritage. This is done to separate and mention that we=20 > do not > like what is happening in case anyone comes asking, and=20 > hopefully give > public pressure to Software Heritage. > 3. Exclude all Software Heritage merch, stands, talks, people in > official capacity, logos, or anything else that participates in=20 > social > events of guix and write it in some rules we have. also write in > channel rules that Software Heritage is offtopic same way=20 > Non-Free > Software is offtopic. > 4. There doesn't seem to be any movement on the side of Guix=20 > towards: > - Accountability in an official capacity of SH for the terrible > handling of the trans name incident and a plan to make it=20 > easier in > the future. > - The LLM problem that was mentioned in this email. > So with that said I urge anybody who has been in contact with=20 > them in > an official Guix capacity to come forward, otherwise I can=20 > volunteer to > be that. Idk if we have a community outreach thing I need to be=20 > in also > for that. (we should if not) > > The above make two assumptions: > 1. That the Guix community is against LLM/"AI". Which for=20 > environmental > and ethical grounds we should be. > 2. That we are a consent culture. > > Coding Wise this has been talked about before some potential=20 > options > are: > - Communicate with Software Heritage to be able to give a "sign"=20 > that > the code that is sent should go or not in the code harvesting=20 > project. > - Remove all Software Heritage integration since its too hard to=20 > be > ethical about it and built a better solution. > > Conclusion: > > To summarize from the steps I wrote above, it seems Software=20 > Heritage > makes it harder and harder for us to actually be an inclusive, > welcoming space we want to be. Idk what that leaves us, as I=20 > said I am > not part of any "insider" discussions. But it seems to not move=20 > that > much and its time to start doing actionable things in another=20 > direction. > > MSavoritias