From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0.migadu.com ([2001:41d0:303:e16b::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms13.migadu.com with LMTPS id 6EA0B0/ucmakzgAAqHPOHw:P1 (envelope-from ) for ; Wed, 19 Jun 2024 14:42:23 +0000 Received: from aspmx1.migadu.com ([2001:41d0:303:e16b::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0.migadu.com with LMTPS id 6EA0B0/ucmakzgAAqHPOHw (envelope-from ) for ; Wed, 19 Jun 2024 16:42:23 +0200 X-Envelope-To: larch@yhetil.org Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=RwOxyMP6; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1718808142; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=WAb84tBlVcGS4Fi343DbHbRpKuSHZ0+JB8bwq9z/nmY=; b=pDRjrjbKi32UYhxeguHa+f9cGdhrrBIQXEEaVhf3p//q34R5S+Wqf/zsLYpeU73bQKUicD vPLzjd3xFeHktUDBPtYIu1wGjQJO4MUWoAAHn4HYeFwiPIRU7ha4dQ94yP+t8uip+CX2c6 8EwLbEbVu5R4FVtAT9oVWyNX3cpOud2sa9zXO7rZe+YNCRsLnkpbZcyqR9cLOcKeQrelV+ ky/NuCcuHWx1WHshvUMfHawGJPAVS96xf3aws2scr3C4t73IXhx2ahy2N0ZPqrK5InpcsW XxaxtqNd67BPsYO2VUa/V7whgudcs2FNgEMFCej968YnKL2W6ZVE9NfBddAJUw== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=RwOxyMP6; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=key1; d=yhetil.org; t=1718808142; a=rsa-sha256; cv=none; b=e0EI4nNMZj6L55KzYV5KlP0O2/Aym3vtIkURmRPVtt87kU3vFWp97+kMny0SSP86BrADDK o54jxW3if/4rga85giVre3ajf7bBg4Z+kIjiGvdf8H0Tk+8R/kj4Sp+7plGN2lmbnwqYTM Vc6I6cSKOfgEK76b2fdB0M9P7/q6774UWpUHptzilLH+a5Ut+N0dtW7bhsqB9+4qFZmRhr ZmgtimHmXQk8W6R9KZz7BaVbDxY/yccPHJ4fd7W+T/M/tJwP+o6NQamMUEpR2+w5SguV6M 4A7qrX+4CSgCFYGg/z4JYhTKyIvwfadAi30/qBiNS32QfEGCcPb2qkZ6BMHLKw== Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id D746C69400 for ; Wed, 19 Jun 2024 16:42:22 +0200 (CEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sJwVa-0006p1-3u; Wed, 19 Jun 2024 10:41:58 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sJwVY-0006oR-DX for guix-devel@gnu.org; Wed, 19 Jun 2024 10:41:56 -0400 Received: from mail-wr1-x42f.google.com ([2a00:1450:4864:20::42f]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sJwVW-0005Eq-OC for guix-devel@gnu.org; Wed, 19 Jun 2024 10:41:56 -0400 Received: by mail-wr1-x42f.google.com with SMTP id ffacd0b85a97d-363db2f9fc5so60526f8f.1 for ; Wed, 19 Jun 2024 07:41:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1718808113; x=1719412913; darn=gnu.org; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:from:to:cc:subject:date:message-id :reply-to; bh=WAb84tBlVcGS4Fi343DbHbRpKuSHZ0+JB8bwq9z/nmY=; b=RwOxyMP6crEPPKCdz+b26lsAQGEZO44EEptGymA+OoGZ98M7y9ZZPq9fpeCjZC6dSz ASN1sgAimlZe4gtN5SRMyu46fPi5cFHXsV0xdwMRUG196gM/y1MjUdkildhOYATW0Ntb lrPuClC49Ar42KZXFxJkuZo1OISLMFFiNJAdoJt1sTriVC6UhmrqMoNG3ntDFB4lsVH9 EDDKXw8kzSxHOLp4O7ri428AFBEh2u50NDa8bra/u9F7/GpOSXhprtF5n7O6AM7XCTw0 G64sRl/uDSqitEVWosI8ioHQkTH8YY492pOxygM3uBFEb82EUZirGxpTnZiqISI7YI2/ Ll8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718808113; x=1719412913; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WAb84tBlVcGS4Fi343DbHbRpKuSHZ0+JB8bwq9z/nmY=; b=Bxjkz29Snn9VQlajYfw3Kqxt8QkC/GdEPaOTO8blardIh2P0TopL22Vg5uNW3WrLCZ DDVMUz534IW1d0se7cNM0v9NWSTO7UQufx81j+GQpfwsonQhNeDHAFNjfdOHPvn5p8Qf AMKjQFCCoW27R05ghflZA/o2QooHIepdfjSQeTlc30H9p/lsqDZO+y84XjzcG0bkzU6v ZcAXfeTR97wO2p1q+io5IHtE0RsXqzQwBoZEVEgAZhKQiL/g/3/aMmdKO3RpQifjLJBQ lhyuTA1zYV6nlgShCViE3h1UOOAvsqrhmnh0TGCZ/Py94DTYo/UQL3M73hrjor3WMrRM JiSg== X-Forwarded-Encrypted: i=1; AJvYcCUqwFE+tz5Xg2zqmAlgyGcCDlt+TUSSZIvFiVN0nbzQST7JhcVuUslLLukezEgBfSQqTDrJ28eJdhfDb+VDUofU0uQ= X-Gm-Message-State: AOJu0YwgIc6u2lXwS8/FBanHgat4itOnLHVE7jPCkNeRZmHsSclybag2 +rcYut50dMqIJNBTxdgzNZgk2yNCF6wAUfK1LDQ2jKZ88oVVicnwkXel9A== X-Google-Smtp-Source: AGHT+IGkmB+VvD2y29rDi68xmmlrOcoSzoA4FiN8KkP/JOaTtxxEnIbJ4Z9mbD/B09OdYI9JbFtieg== X-Received: by 2002:a5d:5f93:0:b0:35f:2929:846e with SMTP id ffacd0b85a97d-363171e28demr2616293f8f.1.1718808112996; Wed, 19 Jun 2024 07:41:52 -0700 (PDT) Received: from lili ([131.254.253.81]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-36332545630sm2676456f8f.94.2024.06.19.07.41.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Jun 2024 07:41:52 -0700 (PDT) From: Simon Tournier To: MSavoritias Cc: Ian Eure , guix-devel@gnu.org Subject: Re: Next Steps For the Software Heritage Problem In-Reply-To: <20240619121338.71b5f340@fannys.me> References: <87a5jh74jf.fsf@gmail.com> <20240619121338.71b5f340@fannys.me> Date: Wed, 19 Jun 2024 16:41:33 +0200 Message-ID: <87plsd9eqq.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=2a00:1450:4864:20::42f; envelope-from=zimon.toutoune@gmail.com; helo=mail-wr1-x42f.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: guix-devel-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN X-Migadu-Queue-Id: D746C69400 X-Migadu-Scanner: mx13.migadu.com X-Migadu-Spam-Score: -9.70 X-Spam-Score: -9.70 X-TUID: uTOOvpxii2xd Hi MSavoritias, all, Let me provide more context. The concern started couple of months ago, to my knowledge. And discussion is still on going. So I think that=E2=80=99s incorrect to say = =E2=80=9Cany result for over 6 months=E2=80=9D. Moreover, I feel you have a misunderstanding about HuggingFace and SWH partnership. From the reading of public information, HuggingFace and BigCode trains on a subset of SWH source code archive. I mean, it is a snapshot and to my knowledge, they provided the list of source code that had been used for training. Not to avoid the question but from a pragmatic point of view, one might ask if the source code you write and do not want to be included in the training dataset, if this source code is concretely part of that training dataset. HuggingFace is not training continuously with source code from SWH. And technically, SWH is an archive i.e., the code is not stored hot. I do not know and I have not read all details by HuggingFace of their method; i.e., which kind of data they process =E2=80=93 independent unique files, complete repository, etc. What I know is that the piece when fetching from SWH is named SWH Vault; it requires to =E2=80=9Ccook=E2=80=9D= and prepare all the files that take times, from minutes to days. All that to say two key points: 1. People behind SWH are well-aware about various sides of the concerns. As said, they are long-time free software supporters. Be sure they have eared community concerns. Some discussions are still pending because as explained, all sides of ethical questions needs to be cautious. Please do not think it is ignored. 2. FWIW, I am in touch with SWH people =E2=80=93 among other members from G= uix community. For instance, in order to feed the discussion, Roberto from SWH pointed to me this blog point by Bruce Perens: https://perens.com/2019/10/12/invasion-of-the-ethical-licenses/ Well, I do not know if the outcome will be aligned with your current opinion, but be sure that your concerns as the others raised by Guix community members are taking into account. Cheers, simon