From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0.migadu.com ([2001:41d0:303:e16b::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms13.migadu.com with LMTPS id yINqNSJDdWYTaQAAqHPOHw:P1 (envelope-from ) for ; Fri, 21 Jun 2024 09:08:51 +0000 Received: from aspmx1.migadu.com ([2001:41d0:303:e16b::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0.migadu.com with LMTPS id yINqNSJDdWYTaQAAqHPOHw (envelope-from ) for ; Fri, 21 Jun 2024 11:08:51 +0200 X-Envelope-To: larch@yhetil.org Authentication-Results: aspmx1.migadu.com; dkim=none ("invalid DKIM record") header.d=msavoritias.me header.s=20210930 header.b="Y21/+eCp"; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org"; dmarc=fail reason="SPF not aligned (relaxed)" header.from=msavoritias.me (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1718960930; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=+fl8MRXebBSW+kolG3ltylB870/OPDg7vmrFDVH5/YQ=; b=dQ+YQiTP7/rz8VmWvVVybb/6JMYGDM0e/jJ1Sw5DNz0isMz71AiXlsIhwXt23Lf4Lxc+HH fJ9LBEg9KMRv60Ziv2f4CXBUcQlyZoT3v+J8wvkfTH2/PjfNEZtkMh6DKmPBMEBwuSjp7A AX0fyaPW07lnlEnqEAP59GElr8nt/sVoU03TH4G3/dGDkDFw+FNkfzFCTB1n00Egq3iRGi MYy0hRmuThVHOnci0hO5PwsxdHQwS/8lVvN0Xp1bUcmp69nW85R9fTGATqWBU2s5e3fuph ymsPtp7ZlcywNf2s22OxJFEdeMZngKzGQlrkGA9Gy7vj8M2RvCGWc1G7+aT8Jw== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none ("invalid DKIM record") header.d=msavoritias.me header.s=20210930 header.b="Y21/+eCp"; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org"; dmarc=fail reason="SPF not aligned (relaxed)" header.from=msavoritias.me (policy=none) ARC-Seal: i=1; s=key1; d=yhetil.org; t=1718960930; a=rsa-sha256; cv=none; b=m1sJcZqRi9gpPAkXxMIK7TboLIi8dR8N0bU2Z5YPdm6QCp7yPOwRUotzQmWFqu25VIk8ZB NsJave5C/QVKTF8gDDL5+5wiwYbPZVlEedWtvieGl8kRGdu/liWmrmLvi8PqFDWuVKV1jZ xCfgrtyGbiJZDCxsqFFlvgjGZI6Qkzt0ZL4PvAkSkp7a94fUaQCnSSOf6TrrK5lAAOWi/Z TrsoqIaZLR+4WzOeVI7wBdlYk87E+vaH9gMqSMV4tg5BOUSvbqErSWnT3LAfpMUxsmAZgj QiKzgPkPUpgQ+wOsOSBhMRGQldYqxpsd4WUdud/T5A7sU2aGKeP9Iiru0QIrzQ== Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 8908F54101 for ; Fri, 21 Jun 2024 11:08:50 +0200 (CEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sKaFg-0007gr-C1; Fri, 21 Jun 2024 05:08:12 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sKaFe-0007gV-Eh for guix-devel@gnu.org; Fri, 21 Jun 2024 05:08:10 -0400 Received: from mail.webarch.email ([81.95.52.48]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sKaFb-0003EO-Ph for guix-devel@gnu.org; Fri, 21 Jun 2024 05:08:10 -0400 Received: from [127.0.0.1] (localhost [127.0.0.1]) by localhost (Mailerdaemon) with ESMTPSA id 8935D1A889FA; Fri, 21 Jun 2024 10:08:04 +0100 (BST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=msavoritias.me; s=20210930; t=1718960885; h=from:subject:date:message-id:to:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:references; bh=+fl8MRXebBSW+kolG3ltylB870/OPDg7vmrFDVH5/YQ=; b=Y21/+eCpLFzxzSSPy7mQU3ovz76vJf8Zt51Aa0WknHQ8YDWQLxWnOLrmWe2FDcXYZhKJGq ffD18YgcIZmcL9P/xbxMOUIsUD22EEQbX7biVdR60y46R3u0d0Hd3XKtd9kPDGvZL//Tt1 I+L3p6vh3/gErqkULwGN9U6JGh7PZpivLC7DDi6MIwJW2USrGlJbyRqUK445sfXM8lmBLa ERU8altPppf+diw7MTyTbn1ExUMgaFjuMX/gsWQdXSQkHe8j11Vv4otz3TuV1p6y9gBdME WjzP0PXEOTFAzNTbkZ4twcbBEx9U9GYCK6/MUt8jklwbVnvralK147srKAnz4Q== Date: Fri, 21 Jun 2024 12:08:02 +0300 From: MSavoritias To: Simon Tournier Cc: Ian Eure , guix-devel@gnu.org Subject: Re: Next Steps For the Software Heritage Problem Message-ID: <20240621120756.1f2c3375@fannys.me> In-Reply-To: <871q4roex2.fsf@gmail.com> References: <87a5jh74jf.fsf@gmail.com> <20240619121338.71b5f340@fannys.me> <87plsd9eqq.fsf@gmail.com> <20240620095117.6b3d3b3b@fannys.me> <871q4roex2.fsf@gmail.com> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.41; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Last-TLS-Session-Version: TLSv1.3 Received-SPF: pass client-ip=81.95.52.48; envelope-from=email@msavoritias.me; helo=mail.webarch.email X-Spam_score_int: -16 X-Spam_score: -1.7 X-Spam_bar: - X-Spam_report: (-1.7 / 5.0 requ) BAYES_00=-1.9, DKIM_INVALID=0.1, DKIM_SIGNED=0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: guix-devel-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN X-Migadu-Queue-Id: 8908F54101 X-Migadu-Scanner: mx13.migadu.com X-Migadu-Spam-Score: -6.26 X-Spam-Score: -6.26 X-TUID: nnjr71pZi4c2 On Thu, 20 Jun 2024 16:40:57 +0200 Simon Tournier wrote: > Being concrete and explicit, could you please share: >=20 > 1. Which part of your code is included in the pretraining dataset? >=20 > It=E2=80=99s easy, you can copy/paste a snippet and it returns the lo= cation > from where it comes from. >=20 > https://huggingface.co/spaces/bigcode/search-v2a >=20 >=20 > 2. What is your code that is included in SWH archive? >=20 > Again, it=E2=80=99s easy: checkout some commit of your repository, th= en > inside this repository, you can run: >=20 > echo "https://archive.softwareheritage.org/swh:1:dir:$(guix hash -S g= it -f hex -H sha1 .)" >=20 > Do not miss the =E2=80=99.=E2=80=99 (dot) once entering the repositor= y. This > command returns SWHID. Other said, using this identifier, you might > know if the repository is stored by SWH. (Be careful with temporary > artifacts as .go files or else.) >=20 > Or you can also check for one specific content: >=20 > $ echo "https://archive.softwareheritage.org/swh:1:cnt:$(guix hash -S g= it -f hex -H sha1 COPYING)" > https://archive.softwareheritage.org/swh:1:cnt:94a9ed024d3859793618152e= a559a168bbcbb5e2 >=20 > And the URL display the content of the file COPYING. Here GPL 3 > license for instance. >=20 >=20 > 3. Where such source code from #2 and #3 is packaged by Guix? my code is not yet in Guix. The question and actions I said came about beca= use I want to commit my package to Guix but the minute I do it its shared without my consent with SWH. > That said, if the source is hosted on GitHub or GitLab.com or SourceHut > or CodeBerg or some other popular forges or even mirrored without your > consent on one of these, please consider that your code had been > ingested by ChatGPT without any mean to verify. Obviously, that=E2=80=99= s not > an argument to accept the situation with HuggingFace and I understand > that you do not want that your publicly release copyleft source code > could be reused by any LLM. >=20 > However, as said several times, rooting this willing of non-inclusion is > larger than your own willing once you publicly released such source code > under some copyleft license. I hope we agree on that. >=20 > Again, I am not trying to avoid something. And again, we all have heard > your points. Nothing is ignored. To my knowledge, the path forward is > not yet well-defined. >=20 > Since we are discussing at length with various different inputs, it > means that a common understanding and/or opinion does not seem obvious. Let me put it more clearly. I am NOT asking for SWH to stop training the LL= M. and I am NOT asking Guix to take a stance against LLMs. and I do know that my code is going to be harvested anyway yeah. what I DO ask is: 1. for SWH to make the sharing of code to the LLM strictly opt-in. 2. For Guix not to enable that behavior until that is fixed because it is a= gainst our social rules and CoC The second step I have already outlined in the first emails some steps we c= ould take to protect our package authors and show our disagreement. And also in the xmpp chat it was shared that guix can just stop sending new= package code until it an opt-in system is in place > >> Well, I do not know if the outcome will be aligned with your current > >> opinion, but be sure that your concerns as the others raised by Guix > >> community members are taking into account. =20 > > > > Thank you for giving me an honest and detailed answer. =20 >=20 > I feel you are pushy on the topic and for what my opinion is worth, it > is not helpful to raise again and again that you want a way to opt-out. > Yeah, people got it. :-) And you are probably not alone, I guess. Ah I am not pushing for what I want tho this is not how the thread started = :) The thread started with me saying what I am going to DO concertely about th= e SWH problem that is all. I already have some practical things if you read it and I am going to start= sending pr/mr/emails as i said soonish to move it forward. I just wanted to give a heads up to the list so it doesn't come out of nowh= ere. > I do not have special information from SWH but I am sure SWH people are > working on the topic. And again, maybe the outcome will not be aligned > with your opinion. Another story. >=20 > Now, the other question you ask to Guix: do we continue to help SWH in > harvesting? You propose to stop, IIUC. Ok, we got it, too. :-) From my > point of view, the path forward is not to speak on the abstract but to > root on concrete numbers; it would help in bounding what we are speaking > about. >=20 > Concretely, if you would like to be able to opt-out, could you point: >=20 > 1. the piece from the Guix source code you are the author? >=20 > 2. source code you are the author that is packaged by Guix? >=20 > Again, I am not trying to avoid the discussion. Instead, I would prefer > to root the discussion on concrete examples. Then it would appear to me > easier to make progress. >=20 > As Greg or Ekaitz also wrote: opting out has implications on the meaning > of freedom behind =E2=80=9Cfree software=E2=80=9C. I mean it does if you think that: 1. Guix doesn't have any social rules on top of the FSF definition (it does= ) and that it doesn't respect consent 2. That its not about the context of something. For example GPL or our CoC = restrict freedom so that people can be more free to express themselves :) > IMHO, that=E2=80=99s not because we would like to opt-out that we could, = would > be able to or allowed to. Therefore, instead of holding opinions on the > abstract, let try to make progress and start on the concrete: which > piece of source code are we speaking about? The softwares here -> https://sr.ht/~msavoritias/ Which the minute I add them to guix the code is going to be in SWH. Not that this is about only my software but as the example you wanted. MSavoritias > Cheers, > simon >=20 > =20