From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1.migadu.com ([2001:41d0:303:e16b::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms13.migadu.com with LMTPS id OIihG2yafWZK+QAA62LTzQ:P1 (envelope-from ) for ; Thu, 27 Jun 2024 16:59:24 +0000 Received: from aspmx1.migadu.com ([2001:41d0:303:e16b::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1.migadu.com with LMTPS id OIihG2yafWZK+QAA62LTzQ (envelope-from ) for ; Thu, 27 Jun 2024 18:59:24 +0200 X-Envelope-To: larch@yhetil.org Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gnu.org header.s=fencepost-gnu-org header.b=CH9WHupP; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=gnu.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1719507564; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=7SIo0s6L/F4Zfi4EmuNcD2au2TqOFz1NAX47I9aw+qo=; b=s+HLGeI9MT/9ZGVCoSXx6VUNFmNjS2GfdoP621LIQ3Ta5ME9vgJUl7nh/H18aiJG/g++dy 1xAvVegkpBvTydBLA+ALBl9iRb2uScqMwpR2b6cS9Qxm2mQUkGgngek78OqjqZwqaIESDk TQQdEMA4apLN2/958vTygbGbscc8H7NImzWy9nQwpemaOPY/WoKJynD4Zmc8e/l4AWPQs2 n+rZPwkjJ/95vlEtPVyxiLUAj4bK2FA7uMgAc55fhTqqQexEXSnUwk8BcviTB8FSDyt5NJ xuvBjumsnjPUNOpKjSVCzsOkxuVOT7hZAiKRyZW2on/IQgJURM7ZU3d5LkdbFA== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gnu.org header.s=fencepost-gnu-org header.b=CH9WHupP; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=gnu.org ARC-Seal: i=1; s=key1; d=yhetil.org; t=1719507564; a=rsa-sha256; cv=none; b=OIk2OXsbo7lVffhMkCJEd87h095RTfVnoOWBdyAEtIAvGJK5Ef8JoZgbROXhO0z9lzuQ5N ICoUk/tq1tzTF+9DKSE1rYvzOI6/ZDRiYAeAi060dwgmlYS2CkXtKRe44vKPWuV4OGv0PZ 45qyFeOMNc1deFurFYF/iWmAO+znl5MYAOTl1uk48fUckxZ31oVJibVH8iKDCyzJXfw1j7 Vk8IQ5oG1aHDpwZ90hVfbHkvBdwJb49g1QKq5XsiB1wa+fjYgSisa5LjDdAOjdPzO+dEbK SZP8XJcCOi/FDLDS0lP5j9TVRuSCIeQTmamDhM0pv8gHh02ZUfnG+nh3WPKwBQ== Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 5549E55626 for ; Thu, 27 Jun 2024 18:59:24 +0200 (CEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sMsSP-0003WS-39; Thu, 27 Jun 2024 12:58:49 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sMsSL-0003W5-Em for guix-devel@gnu.org; Thu, 27 Jun 2024 12:58:46 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sMsSK-0005OG-4v; Thu, 27 Jun 2024 12:58:44 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:Date:References:In-Reply-To:Subject:To: From; bh=7SIo0s6L/F4Zfi4EmuNcD2au2TqOFz1NAX47I9aw+qo=; b=CH9WHupP5hXpsEhriKB+ 0N+wHg3xfrU5tKO5+ECVyYOj2AAPXolIlKxNxgtLqqJPuEZhtnlZeK7VyxT8azVS4U7ODMKap0A0R 2JFyDPBuHlMDqgvibOJhpbIcWb/Epy0nGR5XxQjw/NAkOGuBoBvhMW8qgaH+SQMhM9OdEcOqmx9Zf NwVkeiPyxQuYyz/CHKwozOJse6oCv4hB9gu/zfDrkWqIheD7sj+LmZ3BWeVOjppxxwGQdVfNdEG7Y 8DtDPkBhvj+nMbLtKElshNQKnMGt+apkVAzXh7dCd+XgGTpTJPC8iumOyLGPAvj1nu5/zikhAfZF9 r6/e+t0r1xxwBg==; From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Ian Eure , MSavoritias Cc: guix-devel@gnu.org Subject: Re: Next Steps For the Software Heritage Problem In-Reply-To: <87ed8i4btv.fsf@meson> (Ian Eure's message of "Thu, 27 Jun 2024 08:30:39 -0700") References: <87tthq3yr5.fsf@meson> <87r0ci7eq1.fsf@gnu.org> <87ed8i4btv.fsf@meson> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: =?utf-8?Q?D=C3=A9cadi?= 10 Messidor an 232 de la =?utf-8?Q?R=C3=A9volution=2C?= jour de la Faucille X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Thu, 27 Jun 2024 18:58:39 +0200 Message-ID: <87h6de1fwg.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: guix-devel-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN X-Migadu-Queue-Id: 5549E55626 X-Migadu-Scanner: mx13.migadu.com X-Migadu-Spam-Score: -11.12 X-Spam-Score: -11.12 X-TUID: EHE1TdlSIqrF Hi, Ian Eure skribis: > While this is what their paper claims[1], it doesn=E2=80=99t appear to be > true, since I can see my own GPL=E2=80=99d code in the training set. I= =E2=80=99ve > since moved nearly all of my code off GitHub, but if you visit their > "Am I in The Stack?" page[2] and enter my old username ("ieure"), you > will see pretty much every repository I ever hosted there, including > both unlicensed and GPL=E2=80=99d code. That=E2=80=99s not my experience: I looked for Guix and Coreutils, both GPL= =E2=80=99d, both mirrored on GitHub, and none of it is there. > Some examples are hyperspace-el, > nssh-el, tl1-mode, etc. While there aren=E2=80=99t LICENSE files in those > repos, the file headers of all clearly indicate that they=E2=80=99re GPL= =E2=80=99d. Well, not providing a COPYING/LICENSE file isn=E2=80=99t helping either: fi= le headers may not be all that clear to a parser. At any rate, even though I=E2=80=99m watching this LLM trend with discontent like many in the free software world, I believe this discussion is missing the point and shooting the messenger(s). One of the three missions of SWH is to share code=E2=80=94much like ftp.gnu= .org. That=E2=80=99s all they did. Anyone can access the archive of SWH, for any purpose. HuggingFace trained =E2=80=9CBigCode=E2=80=9D on source SWH harvested from = GitHub (a subset of the SWH archive) and chose to abide by the principles put forward by SWH in its Oct. 2023 statement. HuggingFace didn=E2=80=99t have= to do that; they could have acted like Microsoft and all the =E2=80=9CAI=E2=80= =9D companies and just scrape everything without asking anyone=E2=80=94be it from SWH or = from other sources. There is no =E2=80=9CSoftware Heritage problem=E2=80=9D and really, that ve= ry phrase and the accusative tone in this thread is unwelcome and below our standards for communication in Guix. This has gone too far. This is not the place to further discuss the impact of using LLMs on free software, and definitely not the place to throw unfounded accusations. Thanks, Ludo=E2=80=99.