From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jelle Licht Subject: Re: License auditing Date: Wed, 3 Aug 2016 20:00:38 +0200 Message-ID: References: <20160803195511.3f55fc92@scratchpost.org> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=089e014940046594e705392e9c01 Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:35125) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bV0TJ-000676-Tj for guix-devel@gnu.org; Wed, 03 Aug 2016 14:00:51 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bV0TF-0006Zk-LN for guix-devel@gnu.org; Wed, 03 Aug 2016 14:00:48 -0400 Received: from cavendish.fsfeurope.org ([2001:aa8:ffed::3:102]:43056) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bV0TF-0006ZX-6J for guix-devel@gnu.org; Wed, 03 Aug 2016 14:00:45 -0400 Received: from localhost (localhost [127.0.0.1]) by cavendish.fsfeurope.org (Postfix) with ESMTP id 5577D63B9ED for ; Wed, 3 Aug 2016 20:00:43 +0200 (CEST) Received: from cavendish.fsfeurope.org ([127.0.0.1]) by localhost (cavendish.fsfeurope.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id t0lap2OYmjxi for ; Wed, 3 Aug 2016 20:00:40 +0200 (CEST) Received: from mail-wm0-f41.google.com (mail-wm0-f41.google.com [74.125.82.41]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) (Authenticated sender: jlicht) by cavendish.fsfeurope.org (Postfix) with ESMTPSA id 6E49263B96F for ; Wed, 3 Aug 2016 20:00:40 +0200 (CEST) Received: by mail-wm0-f41.google.com with SMTP id q128so458007452wma.1 for ; Wed, 03 Aug 2016 11:00:40 -0700 (PDT) In-Reply-To: <20160803195511.3f55fc92@scratchpost.org> List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: "Guix-devel" To: Danny Milosavljevic Cc: guix-devel , David Craven --089e014940046594e705392e9c01 Content-Type: text/plain; charset=UTF-8 Something like this could be quite convenient. The following spdx->guix license symbol converter might save you some time: http://paste.lisp.org/display/322105 - Jelle 2016-08-03 19:55 GMT+02:00 Danny Milosavljevic : > On Wed, 3 Aug 2016 18:28:38 +0200 > David Craven wrote: > > > How can I tell the difference between a lgpl2.1 and lgpl2.1+ license? > > "or later" > > > Is this a job that an automated tool could do? Detecting licenses > > included in a tarball? > > I also wonder about that. Usually, the license text is just copied & > pasted anyway, so it should be quite regular. > > If there isn't one, I could write one which would basically, per source > file, > - try to find SPDX identifier, if that doesn't work: > - ignore newline, "#" or ";" or "*" or "//" at the beginning of the line > - lex that into words, where "word" is either [a-zA-Z0-9-]+ or [.,;] > - try to 1:1 match with all the licenses similarily mapped > - if that didn't work, try to find signal words and guess the license and > print the difference in a short form. > > I could do that program in maybe 2 hours and find and extract all the > official license texts in a few more hours. But does such a thing already > exist? [Seems like something obvious to have and I'm writing many other > things already.] > > A human would still have to review the non-1:1 things - there could always > be strange exceptions in the README or whatever - but the majority of cases > should work just fine. > > See also (especially < > https://github.com/triplecheck/>), < > http://www.sciencedirect.com/science/article/pii/S0164121216300905> (also > lists several license checkers; Fossology seems to be a whole webservice > which does that). > > --089e014940046594e705392e9c01 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Something like this could be quite convenient.

The following spdx->guix license symbol converter=
might save you some time:
http://paste.lisp.org/display/322105


- Jelle



2016-08-03 19:55 GMT+02:00 Danny= Milosavljevic <dannym@scratchpost.org>:
On Wed, 3 Aug 2016 18:28:38 +0200
David Craven <david@craven.ch>= wrote:

> How can I tell the difference between a lgpl2.1 and lgpl2.1+ license?<= br>
"or later"

> Is this a job that an automated tool could do? Detecting licenses
> included in a tarball?

I also wonder about that. Usually, the license text is just copied &= amp; pasted anyway, so it should be quite regular.

If there isn't one, I could write one which would basically, per source= file,
- try to find SPDX identifier, if that doesn't work:
- ignore newline, "#" or ";" or "*" or "= //" at the beginning of the line
- lex that into words, where "word" is either [a-zA-Z0-9-]+ or [.= ,;]
- try to 1:1 match with all the licenses similarily mapped
- if that didn't work, try to find signal words and guess the license a= nd print the difference in a short form.

I could do that program in maybe 2 hours and find and extract all the offic= ial license texts in a few more hours. But does such a thing already exist?= [Seems like something obvious to have and I'm writing many other thing= s already.]

A human would still have to review the non-1:1 things - there could always = be strange exceptions in the README or whatever - but the majority of cases= should work just fine.

See also <https://spdx.org/licenses/> (especially <http= s://github.com/triplecheck/>), <http://www.sciencedirect.com/science/article/pii/S0164121216300905<= /a>> (also lists several license checkers; Fossology seems to be a whole= webservice which does that).


--089e014940046594e705392e9c01--