unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: MSavoritias <email@msavoritias.me>
To: Simon Tournier <zimon.toutoune@gmail.com>
Cc: Ian Eure <ian@retrospec.tv>, guix-devel@gnu.org
Subject: Re: Next Steps For the Software Heritage Problem
Date: Fri, 21 Jun 2024 12:08:02 +0300	[thread overview]
Message-ID: <20240621120756.1f2c3375@fannys.me> (raw)
In-Reply-To: <871q4roex2.fsf@gmail.com>

On Thu, 20 Jun 2024 16:40:57 +0200
Simon Tournier <zimon.toutoune@gmail.com> wrote:

> Being concrete and explicit, could you please share:
> 
>  1. Which part of your code is included in the pretraining dataset?
> 
>     It’s easy, you can copy/paste a snippet and it returns the location
>     from where it comes from.
> 
>     https://huggingface.co/spaces/bigcode/search-v2a
> 
> 
>  2. What is your code that is included in SWH archive?
> 
>     Again, it’s easy: checkout some commit of your repository, then
>     inside this repository, you can run:
> 
>     echo "https://archive.softwareheritage.org/swh:1:dir:$(guix hash -S git -f hex -H sha1 .)"
> 
>     Do not miss the ’.’ (dot) once entering the repository.  This
>     command returns SWHID.  Other said, using this identifier, you might
>     know if the repository is stored by SWH.  (Be careful with temporary
>     artifacts as .go files or else.)
> 
>     Or you can also check for one specific content:
> 
>   $ echo "https://archive.softwareheritage.org/swh:1:cnt:$(guix hash -S git -f hex -H sha1 COPYING)"
>   https://archive.softwareheritage.org/swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2
> 
>     And the URL display the content of the file COPYING.  Here GPL 3
>     license for instance.
> 
> 
>  3. Where such source code from #2 and #3 is packaged by Guix?

my code is not yet in Guix. The question and actions I said came about because I want to commit my package to Guix
but the minute I do it its shared without my consent with SWH.

> That said, if the source is hosted on GitHub or GitLab.com or SourceHut
> or CodeBerg or some other popular forges or even mirrored without your
> consent on one of these, please consider that your code had been
> ingested by ChatGPT without any mean to verify.  Obviously, that’s not
> an argument to accept the situation with HuggingFace and I understand
> that you do not want that your publicly release copyleft source code
> could be reused by any LLM.
> 
> However, as said several times, rooting this willing of non-inclusion is
> larger than your own willing once you publicly released such source code
> under some copyleft license.  I hope we agree on that.
> 
> Again, I am not trying to avoid something.  And again, we all have heard
> your points.  Nothing is ignored.  To my knowledge, the path forward is
> not yet well-defined.
> 
> Since we are discussing at length with various different inputs, it
> means that a common understanding and/or opinion does not seem obvious.

Let me put it more clearly. I am NOT asking for SWH to stop training the LLM. and I am NOT asking Guix to take a stance against LLMs.
and I do know that my code is going to be harvested anyway yeah.
what I DO ask is:
1. for SWH to make the sharing of code to the LLM strictly opt-in.
2. For Guix not to enable that behavior until that is fixed because it is against our social rules and CoC
The second step I have already outlined in the first emails some steps we could take to protect our package authors and show our disagreement.
And also in the xmpp chat it was shared that guix can just stop sending new package code until it an opt-in system is in place

> >> Well, I do not know if the outcome will be aligned with your current
> >> opinion, but be sure that your concerns as the others raised by Guix
> >> community members are taking into account.  
> >
> > Thank you for giving me an honest and detailed answer.  
> 
> I feel you are pushy on the topic and for what my opinion is worth, it
> is not helpful to raise again and again that you want a way to opt-out.
> Yeah, people got it. :-) And you are probably not alone, I guess.

Ah I am not pushing for what I want tho this is not how the thread started :)
The thread started with me saying what I am going to DO concertely about the SWH problem that is all.
I already have some practical things if you read it and I am going to start sending pr/mr/emails as i said soonish to move it forward.
I just wanted to give a heads up to the list so it doesn't come out of nowhere.

> I do not have special information from SWH but I am sure SWH people are
> working on the topic.  And again, maybe the outcome will not be aligned
> with your opinion.  Another story.
> 
> Now, the other question you ask to Guix: do we continue to help SWH in
> harvesting?  You propose to stop, IIUC.  Ok, we got it, too. :-) From my
> point of view, the path forward is not to speak on the abstract but to
> root on concrete numbers; it would help in bounding what we are speaking
> about.
> 
> Concretely, if you would like to be able to opt-out, could you point:
> 
> 1. the piece from the Guix source code you are the author?
> 
> 2. source code you are the author that is packaged by Guix?
> 
> Again, I am not trying to avoid the discussion.  Instead, I would prefer
> to root the discussion on concrete examples.  Then it would appear to me
> easier to make progress.
> 
> As Greg or Ekaitz also wrote: opting out has implications on the meaning
> of freedom behind “free software“.

I mean it does if you think that:
1. Guix doesn't have any social rules on top of the FSF definition (it does) and that it doesn't respect consent
2. That its not about the context of something. For example GPL or our CoC restrict freedom so that people can be more free to express themselves :)

> IMHO, that’s not because we would like to opt-out that we could, would
> be able to or allowed to.  Therefore, instead of holding opinions on the
> abstract, let try to make progress and start on the concrete: which
> piece of source code are we speaking about?

The softwares here -> https://sr.ht/~msavoritias/
Which the minute I add them to guix the code is going to be in SWH.
Not that this is about only my software but as the example you wanted.

MSavoritias

> Cheers,
> simon
> 
>    



  reply	other threads:[~2024-06-21  9:08 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-19  7:52 Next Steps For the Software Heritage Problem Simon Tournier
2024-06-19  9:13 ` MSavoritias
2024-06-19  9:54   ` Efraim Flashner
2024-06-19 10:25     ` raingloom
2024-06-19 15:46       ` Ekaitz Zarraga
2024-06-20  6:36         ` MSavoritias
2024-06-20 14:35           ` Ekaitz Zarraga
2024-06-21  8:51             ` MSavoritias
2024-06-19 10:34     ` MSavoritias
2024-06-19 14:41   ` Simon Tournier
2024-06-20  6:51     ` MSavoritias
2024-06-20 14:40       ` Simon Tournier
2024-06-21  9:08         ` MSavoritias [this message]
  -- strict thread matches above, loose matches on Subject: below --
2024-06-28 18:01 Juliana Sims
2024-06-18 17:12 Andy Tai
2024-06-18 18:08 ` Ian Eure
2024-06-19 10:31   ` raingloom
2024-06-27 12:27   ` Ludovic Courtès
2024-06-27 15:30     ` Ian Eure
2024-06-27 16:48       ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-06-27 16:58       ` Ludovic Courtès
2024-06-18  8:37 MSavoritias
2024-06-18 14:19 ` Ian Eure
2024-06-19  8:36   ` Dale Mellor
2024-06-20 17:00     ` Andreas Enge
2024-06-20 18:42       ` Dale Mellor
2024-06-20 20:54         ` Andreas Enge
2024-06-20 20:59           ` Ekaitz Zarraga
2024-06-20 21:12             ` Andreas Enge
2024-06-21  8:41             ` Dale Mellor
2024-06-21  9:19               ` MSavoritias
2024-06-21 13:33                 ` Luis Felipe
2024-06-20 21:27         ` Simon Tournier
2024-06-18 16:21 ` Greg Hogan
2024-06-18 16:33   ` MSavoritias
2024-06-18 17:31     ` Greg Hogan
2024-06-18 17:57       ` Ian Eure
2024-06-19  7:01       ` MSavoritias
2024-06-19  9:57         ` Efraim Flashner
2024-06-20  2:56         ` Felix Lechner via Development of GNU Guix and the GNU System distribution.
2024-06-20  5:18           ` MSavoritias
2024-06-19 10:10 ` Efraim Flashner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240621120756.1f2c3375@fannys.me \
    --to=email@msavoritias.me \
    --cc=guix-devel@gnu.org \
    --cc=ian@retrospec.tv \
    --cc=zimon.toutoune@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).