On 2024-06-21, MSavoritias wrote: > On Fri, 21 Jun 2024 11:46:56 +0200 > Andreas Enge wrote: >> Am Fri, Jun 21, 2024 at 12:12:13PM +0300 schrieb MSavoritias: >> > and as I mention in my first email I want to apply social pressure and make it clear to package authors what is happening so we can move to an opt-in model. >> >> Well, the opt-in model is in place: As soon as I put my code under a free >> license on the Internet, I opt in for it to be harvested by SWH (and anybody >> else, including non-friendly companies and state actors). > > That may be how you have understood it but that is not how most people understand it. > See for example mirroring videos that creators have made online, or more recently some activitypub software harvesting posts for a search engine. I think the fundamental difference is that such videos or activitypub posts are not necessarily released under a license that *expressly* permits sharing. In most cases, those posts and videos are often released without any license at all, and the person retains the legal, social, moral and ethical rights to decide how that content is shared if at all. (I am speaking with those terms in the "plain" english sense, although they may have specific legal meanings in some contexts) > As I have been saying a lot in this thread (because there seem to be a > lot of people in the Guix community not familiar that legal are not > the same as social rules): > -Just because you CAN do something doesn't mean you SHOULD. In the sense that yes somebody can probably harvest all my posts from activitypub and post them somewhere else, > in practise they are an asshole tho and probably are going to be > deferated pretty fast for breaking the social rules of common human > decency :) With something released under a Free Software license, calling someone an "asshole" simply for using the permissions granted by that license, by the very person who granted those permissions, starts to feel a bit like a baited trap and honestly, maybe outright duplicitous. Certainly rude, at the very least. Again, that is different from some arbitrary post or video or cat picture on the internet, which more likely than not has no explicit permissions granted. > TBH it seems you are not the only one in this thread not knowing that laws (legal rules of states) ie. the FSF licenses and work and whatever, are not the same as social rules. > But given that Guix has a CoC and social rules on top of that I am hopeful :) Well... free software ... is a bunch of social rules. Licenses are social rules. Contracts are social rules. Laws are social rules. Admittedly, a lot of the mechanics involved in law creation and enforcement are dubious and suspect and weighted in the favor large, wealthy and/or otherwise powerful entities... I am not sure arguing about social vs. legal vs. whatever is even really a useful direction... almost missing the point entirely. I would rather ask... what is the intention of the Free Software movement? The licenses are merely imperfect tools to achieve those aims, and a clever way to leverage some specific legal mechanisms, but the licenses are not an end unto themselves. For me personally, it is about creating a shared commons that can be used to build healthy thriving local, regional, global and virtual communities that do useful or interesting things... I dare dream that some of those collaboration skills leak into other aspects of life too, not just software! I have a lot of doubts that the LLM training from SWH data is going to further this vision for free software... while the overall work of SWH most definitely does. Given my crude understanding of how LLM training works, it seems hard to imagine that it could actually produce models that comply with all of the license terms of innumerable free software projects, some of which have mutually incompatible terms. For just a handful of examples that are incompatible with the GPL: https://www.gnu.org/licenses/license-list.html#GPLIncompatibleLicenses So unless they are very extremely exceedingly excruciatingly careful about not including incompatible licenses... I have significant doubts. The incentives are just not there. I am a bit disappointed with the very optimistic take SWH has regarding LLMs for code: https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code/ Even with all the identifiers to show which code a model was trained on, the whole point of a large model is it is built from a huge dataset... my guess is it takes significantly more effort to audit that dataset than to create an LLM with it. Which is to say license compliance, one of the few tools of the Free Software movement, seems unlikely to be effective. It is barely effective with more traditional software development. In short, er, at length, I am really not sure what to do. I find the opt-out/opt-in angle to be almost tangential. I find all the hype, and more importantly, active harm done with LLMs to be a very serious threat to free software, various disadvantaged communities, and possibly the literal liveability of our biggest commons so far, dear planet earth... to be appalling. If some social pressure from the Guix community could improve things, by all means, though I worry that it might be at best performative rather than effective, especially if the pressure is placed N parties removed from the source of the actual problem (e.g. those irresponsibly training of LLMs without respecting the licenses). Aaaaaand... I have to cut myself off now. :) live well, vagrant