unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Simon Tournier <zimon.toutoune@gmail.com>
To: MSavoritias <email@msavoritias.me>
Cc: Ian Eure <ian@retrospec.tv>, guix-devel@gnu.org
Subject: Re: Next Steps For the Software Heritage Problem
Date: Wed, 19 Jun 2024 16:41:33 +0200	[thread overview]
Message-ID: <87plsd9eqq.fsf@gmail.com> (raw)
In-Reply-To: <20240619121338.71b5f340@fannys.me>

Hi MSavoritias, all,

Let me provide more context.

The concern started couple of months ago, to my knowledge.  And
discussion is still on going.  So I think that’s incorrect to say “any
result for over 6 months”.

Moreover, I feel you have a misunderstanding about HuggingFace and SWH
partnership.  From the reading of public information, HuggingFace and
BigCode trains on a subset of SWH source code archive.  I mean, it is a
snapshot and to my knowledge, they provided the list of source code that
had been used for training.

Not to avoid the question but from a pragmatic point of view, one might
ask if the source code you write and do not want to be included in the
training dataset, if this source code is concretely part of that
training dataset.

HuggingFace is not training continuously with source code from SWH.

And technically, SWH is an archive i.e., the code is not stored hot.  I
do not know and I have not read all details by HuggingFace of their
method; i.e., which kind of data they process – independent unique
files, complete repository, etc.  What I know is that the piece when
fetching from SWH is named SWH Vault; it requires to “cook” and prepare
all the files that take times, from minutes to days.


All that to say two key points:

1. People behind SWH are well-aware about various sides of the concerns.
As said, they are long-time free software supporters.  Be sure they have
eared community concerns.  Some discussions are still pending because as
explained, all sides of ethical questions needs to be cautious.

Please do not think it is ignored.


2. FWIW, I am in touch with SWH people – among other members from Guix
community.  For instance, in order to feed the discussion, Roberto from
SWH pointed to me this blog point by Bruce Perens:

    https://perens.com/2019/10/12/invasion-of-the-ethical-licenses/

Well, I do not know if the outcome will be aligned with your current
opinion, but be sure that your concerns as the others raised by Guix
community members are taking into account.


Cheers,
simon


  parent reply	other threads:[~2024-06-19 14:42 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-19  7:52 Next Steps For the Software Heritage Problem Simon Tournier
2024-06-19  9:13 ` MSavoritias
2024-06-19  9:54   ` Efraim Flashner
2024-06-19 10:25     ` raingloom
2024-06-19 15:46       ` Ekaitz Zarraga
2024-06-20  6:36         ` MSavoritias
2024-06-20 14:35           ` Ekaitz Zarraga
2024-06-21  8:51             ` MSavoritias
2024-06-19 10:34     ` MSavoritias
2024-06-19 14:41   ` Simon Tournier [this message]
2024-06-20  6:51     ` MSavoritias
2024-06-20 14:40       ` Simon Tournier
2024-06-21  9:08         ` MSavoritias
  -- strict thread matches above, loose matches on Subject: below --
2024-06-28 18:01 Juliana Sims
2024-06-18 17:12 Andy Tai
2024-06-18 18:08 ` Ian Eure
2024-06-19 10:31   ` raingloom
2024-06-27 12:27   ` Ludovic Courtès
2024-06-27 15:30     ` Ian Eure
2024-06-27 16:48       ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-06-27 16:58       ` Ludovic Courtès
2024-06-18  8:37 MSavoritias
2024-06-18 14:19 ` Ian Eure
2024-06-19  8:36   ` Dale Mellor
2024-06-20 17:00     ` Andreas Enge
2024-06-20 18:42       ` Dale Mellor
2024-06-20 20:54         ` Andreas Enge
2024-06-20 20:59           ` Ekaitz Zarraga
2024-06-20 21:12             ` Andreas Enge
2024-06-21  8:41             ` Dale Mellor
2024-06-21  9:19               ` MSavoritias
2024-06-21 13:33                 ` Luis Felipe
2024-06-20 21:27         ` Simon Tournier
2024-06-18 16:21 ` Greg Hogan
2024-06-18 16:33   ` MSavoritias
2024-06-18 17:31     ` Greg Hogan
2024-06-18 17:57       ` Ian Eure
2024-06-19  7:01       ` MSavoritias
2024-06-19  9:57         ` Efraim Flashner
2024-06-20  2:56         ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-06-20  5:18           ` MSavoritias
2024-06-19 10:10 ` Efraim Flashner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87plsd9eqq.fsf@gmail.com \
    --to=zimon.toutoune@gmail.com \
    --cc=email@msavoritias.me \
    --cc=guix-devel@gnu.org \
    --cc=ian@retrospec.tv \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).