From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0.migadu.com ([2001:41d0:303:e16b::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms8.migadu.com with LMTPS id ELVGIRUq+GVjpgAAqHPOHw:P1 (envelope-from ) for ; Mon, 18 Mar 2024 12:48:37 +0100 Received: from aspmx1.migadu.com ([2001:41d0:303:e16b::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0.migadu.com with LMTPS id ELVGIRUq+GVjpgAAqHPOHw (envelope-from ) for ; Mon, 18 Mar 2024 12:48:37 +0100 X-Envelope-To: larch@yhetil.org Authentication-Results: aspmx1.migadu.com; dkim=none ("invalid DKIM record") header.d=msavoritias.me header.s=20210930 header.b=mo2YQlCu; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org"; dmarc=fail reason="SPF not aligned (relaxed)" header.from=msavoritias.me (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1710762517; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=BL6rxkzXhXD4TI/Rm2qVjYgUrkNeYf6rOyrfClQauyc=; b=qhQSrS3RFMhoxkir/Pc4zeFNV4qiMZxqRT8WczTDYsbnZc+/ZI7GKTEpUxbenDBN6lxdCM NtofHwgQoQ02WC/dgwsyAr6YZgRFxDSyjVBqFkDR8TY3lPu26A0OexGWx6jhzWMHFX0/cf zp2PFT5UJgB80PZ1dx6nE+KAZLGlXNiuuHAQpMluv87dYtcezlDBWuYmvSMDEdMTNzL1ty vgBd9ThaVusORoS1ZSBYzhdHFnQQPBL6LU/O3f8gWJTrr7RYhGg0SkiMreNHHNR1wMAFqr dwkcHf2zLJ3u3dVlNjsg3dj4TnCA5XGX75BJOEt0/yvt3KdtB5PoWQDA4BIv6w== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none ("invalid DKIM record") header.d=msavoritias.me header.s=20210930 header.b=mo2YQlCu; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org"; dmarc=fail reason="SPF not aligned (relaxed)" header.from=msavoritias.me (policy=none) ARC-Seal: i=1; s=key1; d=yhetil.org; t=1710762517; a=rsa-sha256; cv=none; b=nyom1+o4Qhb48wSA9FUa9QHg+NMz1FtCmqfVyyQjCPZSwaneGiUI9aT9DY0o7BsvM2GZTo 6XdgLS9b20wROm120iueRyuNouv2crk6Bw9wsQmOJZrwyTUtc3eYMQFpq40EreacWrhDv9 xZ5+iN8GMNql3M9Wr4MeuHPaUfMuSDnIl5xIwaRFNbNky4HfOvv5WDu+K32/OePwYJb77S TTNHR67dqPGb+JlQ+90mmWGRhKaNKMwQb5U40NUT35Sr5k+Hlpegiw5U/itQ/2v88uPyry kYiYaDMKZw/3rKn5FkVbqVlmqNQZZBFkSLWMev2U0HoEBAzDJquddSobDBgKbA== Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 305CF3D154 for ; Mon, 18 Mar 2024 12:48:37 +0100 (CET) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rmBTK-00062n-R9; Mon, 18 Mar 2024 07:48:06 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rmBTF-00060y-OH for guix-devel@gnu.org; Mon, 18 Mar 2024 07:48:04 -0400 Received: from mail.webarch.email ([81.95.52.48]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rmBTC-0001WS-Ux for guix-devel@gnu.org; Mon, 18 Mar 2024 07:48:01 -0400 Received: from [127.0.0.1] (localhost [127.0.0.1]) by localhost (Mailerdaemon) with ESMTPSA id E8D8A1A8CBF6; Mon, 18 Mar 2024 11:47:47 +0000 (GMT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=msavoritias.me; s=20210930; t=1710762473; h=from:subject:date:message-id:to:mime-version:content-type: content-transfer-encoding:content-language:in-reply-to:references; bh=BL6rxkzXhXD4TI/Rm2qVjYgUrkNeYf6rOyrfClQauyc=; b=mo2YQlCuZtEhdPm0cHN11Jj1RMHThCaeOmoCQpfh0SrszDaeKWdIW2U6SsnRXUCJ8Mqx08 3XbF4nt2pjVRNaTO96HfMryzvRL27ih9VT7c/0xcDYvuIae5zvtwayn8SQVmtx0KRb95EB TDLK996aEY5C11sUtM9Afi/APAAwLIw20w6XXG73+rPh2yk6fuEf91yzS+rsTJFO9s6alU mZlsD3QFzj2SzacPggBNRXIQL1dIzxn8E5ymaDBh73qshp/4+MTLLXtc/1zNTp0bYiLLNx F3DTLeSuAt7uu84jp2Sb08UCqvSUWB423UUCKgNMoVFleUS7+Q7i/FmZHEHn7Q== Message-ID: <7881988a-6a95-f0d8-8d6b-6794651c9d2c@fannys.me> Date: Mon, 18 Mar 2024 13:47:42 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.15.0 Subject: Re: Concerns/questions around Software Heritage Archive Content-Language: en-US To: Simon Tournier , Ian Eure , guix-devel References: <87il1mupco.fsf@meson> <87a5mvyjl4.fsf@gmail.com> From: MSavoritias In-Reply-To: <87a5mvyjl4.fsf@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Last-TLS-Session-Version: TLSv1.3 Received-SPF: pass client-ip=81.95.52.48; envelope-from=email@msavoritias.me; helo=mail.webarch.email X-Spam_score_int: -23 X-Spam_score: -2.4 X-Spam_bar: -- X-Spam_report: (-2.4 / 5.0 requ) BAYES_00=-1.9, DKIM_INVALID=0.1, DKIM_SIGNED=0.1, NICE_REPLY_A=-0.684, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: guix-devel-bounces+larch=yhetil.org@gnu.org X-Migadu-Flow: FLOW_IN X-Migadu-Country: US X-Migadu-Spam-Score: -6.20 X-Spam-Score: -6.20 X-Migadu-Queue-Id: 305CF3D154 X-Migadu-Scanner: mx13.migadu.com X-TUID: /hf9QiHsx8UA On 3/18/24 11:28, Simon Tournier wrote: > Hi, > > On sam., 16 mars 2024 at 08:52, Ian Eure wrote: > >> They appear to be using the archive to build LLMs: >> https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/ > About LLM, Software Heritage made a clear statement: > > https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code > > Quoting: > > We feel that the question is no longer whether LLMs for code > should be built. They are already being built, independently of > what we do, and there is no turning back. The real question is > how they should be built and whom they should benefit. > > Principles: > > 1. Knowledge derived from the Software Heritage archive must be > given back to humanity, rather than monopolized for private > gain. The resulting machine learning models must be made available > under a suitable open license, together with the documentation and > toolings needed to use them. > > 2. The initial training data extracted from the Software Heritage > archive must be fully and precisely identified by, for example, > publishing the corresponding SWHID identifiers (note that, in the > context of Software Heritage, public availability of the initial > training data is a given: anyone can obtain it from the > archive). This will enable use cases such as: studying biases > (fairness), verifying if a code of interest was present in the > training data (transparency), and providing appropriate attribution > when generated code bears resemblance to training data (credit), > among others. > > 3. Mechanisms should be established, where possible, for authors to > exclude their archived code from the training inputs before model > training begins. > > I hope it clarifies your concerns to some extent. > > > Moreover, you wrote: « I want absolutely nothing to do with them. » > > Maybe there is a misunderstanding on your side about what “free > software” and GPL means because once “free software”, you cannot prevent > people to use “your” free software for any purposes you dislike. > > If you want to bound the use cases of the software you create, you need > to explicitly specify that in the license. And if you do, your software > will not be considered as “free software”. > > That’s the double sword of “free software”. :-) Simon, 1. You seem to be misunderstanding the statement here that was said. What you can do legally and what you can do socially are not always the same thing. As advice for the future when somebody says a concern or wish they have, your first statement shouldn't be "but its legal" because that completely dismisses any constructive discussion that could be done. And you seem to be talking about legal a lot here so thats not a good look. Yes, legally Ian probably can't get lawyers on you. But nobody is talking about legally here. What is in question here is whether Software Heritage respects people enough to do the right thing and respect their wishes without getting lawyers/legal involved. Besides with the way you are framing Free Software as not respecting any social rules then that makes Free Software not attractive which is the opposite of what we are trying to do here :) 2. > Somehow, a Content-Addressed system is designed around immutable content. And if one know how to implement a Content-Addressed system relying on mutable content, I would be very interested to know more about it. Please refrain from doing such remarks. Nobody here suggested anything that you mention here and you effectively devalue the discussion by arguing like this and frame other people as stupid. 3. Its not on people that are not included to write the code. If Guix is to be an inclusive project, then Guix should do the work so that people feel included. You may disagree with this sure, but shutting down the discussion because nobody wrote the code for you is very elitist of you. 4. > This language is not acceptable on Guix channel of communication. Calling out transphobia it is very much accepted here actually :) Its transphobic speech that is not accepted. I welcome Software Heritage to make an announcement about this or some kind of official communication saying their stance. Although I still wouldn't use them due to the LLMs and AI stuff that they are using. Which I hope at some point realize their mistake. MSavoritias