all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: "Ludovic Courtès" <ludo@gnu.org>
To: Ian Eure <ian@retrospec.tv>, MSavoritias <email@msavoritias.me>
Cc: guix-devel@gnu.org
Subject: Re: Next Steps For the Software Heritage Problem
Date: Thu, 27 Jun 2024 18:58:39 +0200	[thread overview]
Message-ID: <87h6de1fwg.fsf@gnu.org> (raw)
In-Reply-To: <87ed8i4btv.fsf@meson> (Ian Eure's message of "Thu, 27 Jun 2024 08:30:39 -0700")

Hi,

Ian Eure <ian@retrospec.tv> skribis:

> While this is what their paper claims[1], it doesn’t appear to be
> true, since I can see my own GPL’d code in the training set.  I’ve
> since moved nearly all of my code off GitHub, but if you visit their
> "Am I in The Stack?" page[2] and enter my old username ("ieure"), you
> will see pretty much every repository I ever hosted there, including
> both unlicensed and GPL’d code.

That’s not my experience: I looked for Guix and Coreutils, both GPL’d,
both mirrored on GitHub, and none of it is there.

> Some examples are hyperspace-el,
> nssh-el, tl1-mode, etc.  While there aren’t LICENSE files in those
> repos, the file headers of all clearly indicate that they’re GPL’d.

Well, not providing a COPYING/LICENSE file isn’t helping either: file
headers may not be all that clear to a parser.


At any rate, even though I’m watching this LLM trend with discontent
like many in the free software world, I believe this discussion is
missing the point and shooting the messenger(s).

One of the three missions of SWH is to share code—much like ftp.gnu.org.
That’s all they did.  Anyone can access the archive of SWH, for any
purpose.

HuggingFace trained “BigCode” on source SWH harvested from GitHub (a
subset of the SWH archive) and chose to abide by the principles put
forward by SWH in its Oct. 2023 statement.  HuggingFace didn’t have to
do that; they could have acted like Microsoft and all the “AI” companies
and just scrape everything without asking anyone—be it from SWH or from
other sources.


There is no “Software Heritage problem” and really, that very phrase and
the accusative tone in this thread is unwelcome and below our standards
for communication in Guix.  This has gone too far.  This is not the
place to further discuss the impact of using LLMs on free software, and
definitely not the place to throw unfounded accusations.

Thanks,
Ludo’.


  parent reply	other threads:[~2024-06-27 16:59 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-18 17:12 Next Steps For the Software Heritage Problem Andy Tai
2024-06-18 18:08 ` Ian Eure
2024-06-19 10:31   ` raingloom
2024-06-27 12:27   ` Ludovic Courtès
2024-06-27 15:30     ` Ian Eure
2024-06-27 16:48       ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-06-27 16:58       ` Ludovic Courtès [this message]
  -- strict thread matches above, loose matches on Subject: below --
2024-06-28 18:01 Juliana Sims
2024-06-19  7:52 Simon Tournier
2024-06-19  9:13 ` MSavoritias
2024-06-19  9:54   ` Efraim Flashner
2024-06-19 10:25     ` raingloom
2024-06-19 15:46       ` Ekaitz Zarraga
2024-06-20  6:36         ` MSavoritias
2024-06-20 14:35           ` Ekaitz Zarraga
2024-06-21  8:51             ` MSavoritias
2024-06-19 10:34     ` MSavoritias
2024-06-19 14:41   ` Simon Tournier
2024-06-20  6:51     ` MSavoritias
2024-06-20 14:40       ` Simon Tournier
2024-06-21  9:08         ` MSavoritias
2024-06-18  8:37 MSavoritias
2024-06-18 14:19 ` Ian Eure
2024-06-19  8:36   ` Dale Mellor
2024-06-20 17:00     ` Andreas Enge
2024-06-20 18:42       ` Dale Mellor
2024-06-20 20:54         ` Andreas Enge
2024-06-20 20:59           ` Ekaitz Zarraga
2024-06-20 21:12             ` Andreas Enge
2024-06-21  8:41             ` Dale Mellor
2024-06-21  9:19               ` MSavoritias
2024-06-21 13:33                 ` Luis Felipe
2024-06-20 21:27         ` Simon Tournier
2024-06-18 16:21 ` Greg Hogan
2024-06-18 16:33   ` MSavoritias
2024-06-18 17:31     ` Greg Hogan
2024-06-18 17:57       ` Ian Eure
2024-06-19  7:01       ` MSavoritias
2024-06-19  9:57         ` Efraim Flashner
2024-06-20  2:56         ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-06-20  5:18           ` MSavoritias
2024-06-19 10:10 ` Efraim Flashner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h6de1fwg.fsf@gnu.org \
    --to=ludo@gnu.org \
    --cc=email@msavoritias.me \
    --cc=guix-devel@gnu.org \
    --cc=ian@retrospec.tv \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.