all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* License auditing
@ 2016-08-03 16:28 David Craven
  2016-08-03 17:55 ` Danny Milosavljevic
  2016-08-03 18:03 ` Leo Famulari
  0 siblings, 2 replies; 14+ messages in thread
From: David Craven @ 2016-08-03 16:28 UTC (permalink / raw)
  To: guix-devel

Hi!

How can I tell the difference between a lgpl2.1 and lgpl2.1+ license?

Is this a job that an automated tool could do? Detecting licenses
included in a tarball?

Cheers
David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: License auditing
  2016-08-03 16:28 License auditing David Craven
@ 2016-08-03 17:55 ` Danny Milosavljevic
  2016-08-03 18:00   ` Jelle Licht
                     ` (2 more replies)
  2016-08-03 18:03 ` Leo Famulari
  1 sibling, 3 replies; 14+ messages in thread
From: Danny Milosavljevic @ 2016-08-03 17:55 UTC (permalink / raw)
  To: David Craven, guix-devel

On Wed, 3 Aug 2016 18:28:38 +0200
David Craven <david@craven.ch> wrote:

> How can I tell the difference between a lgpl2.1 and lgpl2.1+ license?

"or later"

> Is this a job that an automated tool could do? Detecting licenses
> included in a tarball?

I also wonder about that. Usually, the license text is just copied & pasted anyway, so it should be quite regular.

If there isn't one, I could write one which would basically, per source file,
- try to find SPDX identifier, if that doesn't work:
- ignore newline, "#" or ";" or "*" or "//" at the beginning of the line
- lex that into words, where "word" is either [a-zA-Z0-9-]+ or [.,;]
- try to 1:1 match with all the licenses similarily mapped
- if that didn't work, try to find signal words and guess the license and print the difference in a short form.

I could do that program in maybe 2 hours and find and extract all the official license texts in a few more hours. But does such a thing already exist? [Seems like something obvious to have and I'm writing many other things already.]

A human would still have to review the non-1:1 things - there could always be strange exceptions in the README or whatever - but the majority of cases should work just fine.

See also <https://spdx.org/licenses/> (especially <https://github.com/triplecheck/>), <http://www.sciencedirect.com/science/article/pii/S0164121216300905> (also lists several license checkers; Fossology seems to be a whole webservice which does that).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: License auditing
  2016-08-03 17:55 ` Danny Milosavljevic
@ 2016-08-03 18:00   ` Jelle Licht
  2016-08-03 18:05   ` Leo Famulari
  2016-08-03 18:05   ` David Craven
  2 siblings, 0 replies; 14+ messages in thread
From: Jelle Licht @ 2016-08-03 18:00 UTC (permalink / raw)
  To: Danny Milosavljevic; +Cc: guix-devel, David Craven

[-- Attachment #1: Type: text/plain, Size: 1792 bytes --]

Something like this could be quite convenient.

The following spdx->guix license symbol converter
might save you some time:
http://paste.lisp.org/display/322105


- Jelle



2016-08-03 19:55 GMT+02:00 Danny Milosavljevic <dannym@scratchpost.org>:

> On Wed, 3 Aug 2016 18:28:38 +0200
> David Craven <david@craven.ch> wrote:
>
> > How can I tell the difference between a lgpl2.1 and lgpl2.1+ license?
>
> "or later"
>
> > Is this a job that an automated tool could do? Detecting licenses
> > included in a tarball?
>
> I also wonder about that. Usually, the license text is just copied &
> pasted anyway, so it should be quite regular.
>
> If there isn't one, I could write one which would basically, per source
> file,
> - try to find SPDX identifier, if that doesn't work:
> - ignore newline, "#" or ";" or "*" or "//" at the beginning of the line
> - lex that into words, where "word" is either [a-zA-Z0-9-]+ or [.,;]
> - try to 1:1 match with all the licenses similarily mapped
> - if that didn't work, try to find signal words and guess the license and
> print the difference in a short form.
>
> I could do that program in maybe 2 hours and find and extract all the
> official license texts in a few more hours. But does such a thing already
> exist? [Seems like something obvious to have and I'm writing many other
> things already.]
>
> A human would still have to review the non-1:1 things - there could always
> be strange exceptions in the README or whatever - but the majority of cases
> should work just fine.
>
> See also <https://spdx.org/licenses/> (especially <
> https://github.com/triplecheck/>), <
> http://www.sciencedirect.com/science/article/pii/S0164121216300905> (also
> lists several license checkers; Fossology seems to be a whole webservice
> which does that).
>
>

[-- Attachment #2: Type: text/html, Size: 2769 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: License auditing
  2016-08-03 16:28 License auditing David Craven
  2016-08-03 17:55 ` Danny Milosavljevic
@ 2016-08-03 18:03 ` Leo Famulari
  2016-08-03 20:42   ` Ludovic Courtès
  1 sibling, 1 reply; 14+ messages in thread
From: Leo Famulari @ 2016-08-03 18:03 UTC (permalink / raw)
  To: David Craven; +Cc: guix-devel

On Wed, Aug 03, 2016 at 06:28:38PM +0200, David Craven wrote:
> Hi!
> 
> How can I tell the difference between a lgpl2.1 and lgpl2.1+ license?

The license headers in the source files will say if they are licensed
under version 2.1 or later. Something like this:

"...either version 2.1 of the License, or (at your option) any later
version."

I've heard that if the only license information is a copy of the full
license (for example, in LICENSE or COPYING) and the files have no
license headers, then the "or later" part is implied, but I'm not sure.

> Is this a job that an automated tool could do? Detecting licenses
> included in a tarball?

A tool might be able to suggest something, but I think that it will
always require human inspection. And we only have to do this inspection
once per package version, on behalf of everybody else that uses the
distribution.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: License auditing
  2016-08-03 17:55 ` Danny Milosavljevic
  2016-08-03 18:00   ` Jelle Licht
@ 2016-08-03 18:05   ` Leo Famulari
  2016-08-03 18:05   ` David Craven
  2 siblings, 0 replies; 14+ messages in thread
From: Leo Famulari @ 2016-08-03 18:05 UTC (permalink / raw)
  To: Danny Milosavljevic; +Cc: guix-devel, David Craven

On Wed, Aug 03, 2016 at 07:55:11PM +0200, Danny Milosavljevic wrote:
> A human would still have to review the non-1:1 things - there could
> always be strange exceptions in the README or whatever - but the
> majority of cases should work just fine.

There could also be binaries with no source code, some code with a
unique license, or countless other ways to confuse a license parser.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: License auditing
  2016-08-03 17:55 ` Danny Milosavljevic
  2016-08-03 18:00   ` Jelle Licht
  2016-08-03 18:05   ` Leo Famulari
@ 2016-08-03 18:05   ` David Craven
  2016-08-03 18:15     ` David Craven
  2 siblings, 1 reply; 14+ messages in thread
From: David Craven @ 2016-08-03 18:05 UTC (permalink / raw)
  To: Danny Milosavljevic; +Cc: guix-devel

>> How can I tell the difference between a lgpl2.1 and lgpl2.1+ license?
>"or later"

Yes, I get that, but does it explicitly say the words "or latter" in the license
text? What about when there are lgpl2, lgpl2.1 and lgpl3 license files in
the repo? Is that (list lgpl2.0 lgpl2.1 lgpl3) or lgpl2.0+?

> I could do that program in maybe 2 hours and find and extract all the
> official license texts in a few more hours. But does such a thing already
> exist? [Seems like something obvious to have and I'm writing many other
> things already.]

I only found this, which only detects 3 license types.
https://github.com/tantalor/detect-license

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: License auditing
  2016-08-03 18:05   ` David Craven
@ 2016-08-03 18:15     ` David Craven
  0 siblings, 0 replies; 14+ messages in thread
From: David Craven @ 2016-08-03 18:15 UTC (permalink / raw)
  To: Danny Milosavljevic, Leo Famulari; +Cc: guix-devel

> There could also be binaries with no source code, some code with a
> unique license, or countless other ways to confuse a license parser.

Well we do have a sizeable existing test-suite so that's a plus...

> "...either version 2.1 of the License, or (at your option) any later
> version."

That answers my question thank you!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: License auditing
  2016-08-03 18:03 ` Leo Famulari
@ 2016-08-03 20:42   ` Ludovic Courtès
  2016-08-03 21:11     ` Alex Griffin
  0 siblings, 1 reply; 14+ messages in thread
From: Ludovic Courtès @ 2016-08-03 20:42 UTC (permalink / raw)
  To: Leo Famulari; +Cc: guix-devel, David Craven

Howdy!

Leo Famulari <leo@famulari.name> skribis:

> I've heard that if the only license information is a copy of the full
> license (for example, in LICENSE or COPYING) and the files have no
> license headers, then the "or later" part is implied, but I'm not sure.

In reality, the GNU licenses permit the recipient to choose any version
of the license.  For instance, Section 14 of GPLv3 reads:

  If the Program does not specify a version number of the GNU General
  Public License, you may choose any version ever published by the Free
  Software Foundation.

However, in Guix we encode such cases as ‘gpl3+’ (or similar), rather
than ‘gpl1+’.

Ludo’.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: License auditing
  2016-08-03 20:42   ` Ludovic Courtès
@ 2016-08-03 21:11     ` Alex Griffin
  2016-08-03 22:59       ` David Craven
  2016-08-04 14:23       ` Ludovic Courtès
  0 siblings, 2 replies; 14+ messages in thread
From: Alex Griffin @ 2016-08-03 21:11 UTC (permalink / raw)
  To: Ludovic Courtès, Leo Famulari; +Cc: guix-devel, David Craven

On Wed, Aug 3, 2016, at 03:42 PM, Ludovic Courtès wrote:
> However, in Guix we encode such cases as ‘gpl3+’ (or similar), rather
> than ‘gpl1+’.

That seems wrong and confusing. It means that if I'm writing a GPLv2
program, for example, then I cannot rely on Guix to search for legally
compatible libraries to use. It also means we cannot implement a tool to
automatically flag Guix package dependencies for possible license
violations.
-- 
Alex Griffin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: License auditing
  2016-08-03 21:11     ` Alex Griffin
@ 2016-08-03 22:59       ` David Craven
  2016-08-04 14:23       ` Ludovic Courtès
  1 sibling, 0 replies; 14+ messages in thread
From: David Craven @ 2016-08-03 22:59 UTC (permalink / raw)
  To: Alex Griffin, Danny Milosavljevic; +Cc: guix-devel

I found a promising package to help with license auditing. It's not
perfect judging from the bug reports, but it seems pretty nice. It is
the only option I found which is intended for scripted usage (has a
nice cli interface). I'll package it tomorrow. Interesting would be to
write a plugin for guix to see how it's findings compare to the
licenses declared in guixsd.

[0] https://github.com/nexB/scancode-toolkit/blob/develop/src/scancode/cli.py#L204

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: License auditing
  2016-08-03 21:11     ` Alex Griffin
  2016-08-03 22:59       ` David Craven
@ 2016-08-04 14:23       ` Ludovic Courtès
  2016-08-04 14:40         ` Alex Griffin
  1 sibling, 1 reply; 14+ messages in thread
From: Ludovic Courtès @ 2016-08-04 14:23 UTC (permalink / raw)
  To: Alex Griffin; +Cc: guix-devel, David Craven

Hi,

Alex Griffin <a@ajgrf.com> skribis:

> On Wed, Aug 3, 2016, at 03:42 PM, Ludovic Courtès wrote:
>> However, in Guix we encode such cases as ‘gpl3+’ (or similar), rather
>> than ‘gpl1+’.
>
> That seems wrong and confusing.

Strictly speaking it’s wrong, but I think it better reflects the intent
of the authors (I think authors who throw a GPLv3 ‘COPYING’ file without
bothering to add file headers probably think that GPLv3 and maybe later
versions apply, but not previous versions.)

> It means that if I'm writing a GPLv2 program, for example, then I
> cannot rely on Guix to search for legally compatible libraries to
> use. It also means we cannot implement a tool to automatically flag
> Guix package dependencies for possible license violations.

I suppose many package violations could be detected using Guix, but
you’re right that subtle cases like this one can go undetected.

In the end, we’re talking about legal documents whose interpretation
isn’t as formal as we would like.  So I suspect that no single tool can
provide what you want—there is no “license calculus”.  Tools like
Fossology go a long way, but AFAIK they are no substitute for proper
manual auditing.

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: License auditing
  2016-08-04 14:23       ` Ludovic Courtès
@ 2016-08-04 14:40         ` Alex Griffin
  0 siblings, 0 replies; 14+ messages in thread
From: Alex Griffin @ 2016-08-04 14:40 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

On Thu, Aug 4, 2016, at 09:23 AM, Ludovic Courtès wrote:
> Strictly speaking it’s wrong, but I think it better reflects the intent
> of the authors (I think authors who throw a GPLv3 ‘COPYING’ file without
> bothering to add file headers probably think that GPLv3 and maybe later
> versions apply, but not previous versions.)

Ah, I guess that seems more reasonable when the whole situation is laid
out.

> I suppose many package violations could be detected using Guix, but
> you’re right that subtle cases like this one can go undetected.
> 
> In the end, we’re talking about legal documents whose interpretation
> isn’t as formal as we would like.  So I suspect that no single tool can
> provide what you want—there is no “license calculus”.  Tools like
> Fossology go a long way, but AFAIK they are no substitute for proper
> manual auditing.

I know it can't and shouldn't be fully automated, but we can still build
useful tools to help us.

-- 
Alex Griffin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: License auditing
@ 2016-08-04 17:41 Philippe Ombredanne
  0 siblings, 0 replies; 14+ messages in thread
From: Philippe Ombredanne @ 2016-08-04 17:41 UTC (permalink / raw)
  To: guix-devel

On Wed, 3 Aug 2016 19:55:11 +0200, Danny Milosavljevic wrote:
> See also <https://spdx.org/licenses/> (especially
> <https://github.com/triplecheck/>),
> <http://www.sciencedirect.com/science/article/pii/S0164121216300905> (also
> lists several license checkers; Fossology seems to be a whole webservice which
> does that).

On Wed, 3 Aug 2016 14:05:06 -0400, Leo Famulari wrote:
> There could also be binaries with no source code, some code with a
> unique license, or countless other ways to confuse a license parser.

On Thu, 4 Aug 2016 00:59:52 +0200, David Craven wrote:
> I found a promising package to help with license auditing. It's not
> perfect judging from the bug reports, but it seems pretty nice. It is
> the only option I found which is intended for scripted usage (has a
> nice cli interface). I'll package it tomorrow. Interesting would be to
> write a plugin for guix to see how it's findings compare to the
> licenses declared in guixsd.
> [0]
> https://github.com/nexB/scancode-toolkit/blob/develop/src/scancode/cli.py#L204


Hello guixers!
Scancode maintainer here!
Scancode detects licenses and more and it tries hard not to get
confused by the countless
ways licenses are written or mentioned.

It also does not require any specific install beyond a Python
interpreter and runs
from the command line and could come handy there.

There are bugs but I track these actively and fix them eventually!
I am curious if this breaks a few assumptions in Guix or not.
For instance, I vendor all third-parties such that a simple tarball
has everything it
needs to run except for a Python interpreter. But this could be
eventually borken down
in a python package and several lib deps if you want to go this route in Guix.

PS: I am also one of the SPDX co-founders.

I would be glad to help in any way I can.
-- 
Cordially
Philippe Ombredanne

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: License auditing
@ 2016-08-04 18:34 Philippe Ombredanne
  0 siblings, 0 replies; 14+ messages in thread
From: Philippe Ombredanne @ 2016-08-04 18:34 UTC (permalink / raw)
  To: guix-devel

On Wed, 3 Aug 2016 19:55:11 +0200, Danny Milosavljevic wrote:
> See also <https://spdx.org/licenses/> (especially
> <https://github.com/triplecheck/>),
> <http://www.sciencedirect.com/science/article/pii/S0164121216300905> (also
> lists several license checkers; Fossology seems to be a whole webservice which
> does that).

On Wed, 3 Aug 2016 14:05:06 -0400, Leo Famulari wrote:
> There could also be binaries with no source code, some code with a
> unique license, or countless other ways to confuse a license parser.

On Thu, 4 Aug 2016 00:59:52 +0200, David Craven wrote:
> I found a promising package to help with license auditing. It's not
> perfect judging from the bug reports, but it seems pretty nice. It is
> the only option I found which is intended for scripted usage (has a
> nice cli interface). I'll package it tomorrow. Interesting would be to
> write a plugin for guix to see how it's findings compare to the
> licenses declared in guixsd.
> [0]
> https://github.com/nexB/scancode-toolkit/blob/develop/src/scancode/cli.py#L204


Hello guixers!
Scancode maintainer here.
Scancode detects licenses and more and it tries hard not to get
(too) confused by the countless ways licenses are written or mentioned.

It also does not require any specific install beyond a Python
interpreter and runs from the command line and could come
handy there.

There are bugs but I track these actively and fix them eventually.

I am curious if this breaks a few assumptions in Guix or not:
I vendor all third-parties such that a simple tarball has
everything it needs to run except for a Python interpreter.

But this could be eventually broken down in a python package
and several lib deps to package it in Guix.
See also this conversation with David [1]

PS: I am also one of the SPDX co-founders.

I will be glad to help in any modest way I can.

 [1] https://github.com/nexB/scancode-toolkit/issues/288

-- 
Cordially
Philippe Ombredanne

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-08-04 18:35 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-03 16:28 License auditing David Craven
2016-08-03 17:55 ` Danny Milosavljevic
2016-08-03 18:00   ` Jelle Licht
2016-08-03 18:05   ` Leo Famulari
2016-08-03 18:05   ` David Craven
2016-08-03 18:15     ` David Craven
2016-08-03 18:03 ` Leo Famulari
2016-08-03 20:42   ` Ludovic Courtès
2016-08-03 21:11     ` Alex Griffin
2016-08-03 22:59       ` David Craven
2016-08-04 14:23       ` Ludovic Courtès
2016-08-04 14:40         ` Alex Griffin
  -- strict thread matches above, loose matches on Subject: below --
2016-08-04 17:41 Philippe Ombredanne
2016-08-04 18:34 Philippe Ombredanne

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.