From mboxrd@z Thu Jan 1 00:00:00 1970 From: Danny Milosavljevic Subject: Re: License auditing Date: Wed, 3 Aug 2016 19:55:11 +0200 Message-ID: <20160803195511.3f55fc92@scratchpost.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:33692) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bV0O3-00045j-5P for guix-devel@gnu.org; Wed, 03 Aug 2016 13:55:24 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bV0Ny-0005CE-0r for guix-devel@gnu.org; Wed, 03 Aug 2016 13:55:22 -0400 Received: from dd1012.kasserver.com ([85.13.128.8]:37019) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bV0Nx-0005C7-QU for guix-devel@gnu.org; Wed, 03 Aug 2016 13:55:17 -0400 In-Reply-To: List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: "Guix-devel" To: David Craven , guix-devel On Wed, 3 Aug 2016 18:28:38 +0200 David Craven wrote: > How can I tell the difference between a lgpl2.1 and lgpl2.1+ license? "or later" > Is this a job that an automated tool could do? Detecting licenses > included in a tarball? I also wonder about that. Usually, the license text is just copied & pasted anyway, so it should be quite regular. If there isn't one, I could write one which would basically, per source file, - try to find SPDX identifier, if that doesn't work: - ignore newline, "#" or ";" or "*" or "//" at the beginning of the line - lex that into words, where "word" is either [a-zA-Z0-9-]+ or [.,;] - try to 1:1 match with all the licenses similarily mapped - if that didn't work, try to find signal words and guess the license and print the difference in a short form. I could do that program in maybe 2 hours and find and extract all the official license texts in a few more hours. But does such a thing already exist? [Seems like something obvious to have and I'm writing many other things already.] A human would still have to review the non-1:1 things - there could always be strange exceptions in the README or whatever - but the majority of cases should work just fine. See also (especially ), (also lists several license checkers; Fossology seems to be a whole webservice which does that).