unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Ricardo Wurmus <rekado@elephly.net>
To: "Ludovic Courtès" <ludovic.courtes@inria.fr>
Cc: guix-devel@gnu.org
Subject: Re: Accuracy of importers?
Date: Thu, 28 Oct 2021 12:25:56 +0000	[thread overview]
Message-ID: <87y26dxqz9.fsf@elephly.net> (raw)
In-Reply-To: <878ryd8we4.fsf@inria.fr>

[-- Attachment #1: Type: text/plain, Size: 3001 bytes --]


Ludovic Courtès <ludovic.courtes@inria.fr> writes:

> Hello Guix!
>
> As I’m preparing my PackagingCon talk and wondering how language 
> package
> managers could make our lives easier, I thought it’d be 
> interesting to
> know how well our importers are doing.
>
> My understanding is that most of them require manual 
> intervention—i.e.,
> one has to tweak what ‘guix import’ produces, even if we ignore
> synopsis/description/license, to set the right inputs, etc.  If 
> we were
> to estimate the fraction of imported packages for which manual 
> changes
> are needed, what would it look like?
>
>    importer     fraction of imported packages needing changes
[…]
>    cran         5% (Ricardo? Simon? seems to almost always 
>    work?)

Like Lars and Simon wrote: the importers work *really* well for 
both CRAN and Bioconductor, so much so that I’m using them in the 
background here:

https://git.elephly.net/gitweb.cgi?p=software/r-guix-install.git;a=blob;f=guix-install.R;h=2766aa1f2d248a8ed2a4eb4c3244b85574d326e2;hb=HEAD

The biggest annoyance is the missing “license:” prefix when 
packaging things for gnu/packages/cran.scm or 
gnu/packages/bioconductor.scm.  Descriptions need regular clean- 
up work (e.g. to complete sentences), even though we’re using some 
heuristics to fix the most common stylistic problems.  It’s really 
not a big deal, though.

The biggest missing feature is recursive import of dependencies 
hosted on Github or Mercurial (with “-r -a git” or “-r -a hg”). 
I.e. a package on Github that declares a dependency on another 
package that’s also only hosted on Github will fail to import that 
dependency.  This is pretty rare, but it happens with experimental 
bioinfo software.

>    texlive      (Ricardo? Thiago? Marius?)

This one is not usable.  I’d even add “at all”.  I keep announcing 
that one day I’ll replace it with a new importer, but that new 
importer just isn’t ready yet.

> What about licensing info: which ones provide accurate licensing 
> info?
> My guess:
>
>    gnu
>    pypi
>    cpan
>    cran

The CRAN importer is as accurate as upstream allows.  CRAN 
requires a free license, Bioconductor requires a license 
declaration (there have been very few cases where the license was 
not correct, but a number of cases where the license was non-free, 
such as the Artistic 1.0 license.  Bioconductor sometimes is 
sneaky and the R code is free but a necessary library is not.

>    texlive

Pretty terrible.  The license declaration is generally too vague. 
Licenses are often declared without version number, and sometimes 
it’s just some generic “free” license.  A new importer based on 
texlive.tlpdb would not improve this by much, because the upstream 
declarations are just spotty and unreliable.

-- 
Ricardo


PS: attached is a rough WIP patch of what I had been using to 
import new texlive stuff.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: texlive-import.diff --]
[-- Type: text/x-patch, Size: 6523 bytes --]

diff --git a/guix/import/texlive.scm b/guix/import/texlive.scm
index 18d8b95ee0..b94aa1cf40 100644
--- a/guix/import/texlive.scm
+++ b/guix/import/texlive.scm
@@ -19,10 +19,12 @@
 
 (define-module (guix import texlive)
   #:use-module (ice-9 match)
+  #:use-module (ice-9 rdelim)
   #:use-module (sxml simple)
   #:use-module (sxml xpath)
   #:use-module (srfi srfi-11)
   #:use-module (srfi srfi-1)
+  #:use-module (srfi srfi-2)
   #:use-module (srfi srfi-26)
   #:use-module (srfi srfi-34)
   #:use-module (web uri)
@@ -125,9 +127,9 @@ (define (fetch-sxml name)
       (xml->sxml (http-fetch url)
                  #:trim-whitespace? #t))))
 
-(define (guix-name component name)
+(define (guix-name name)
   "Return a Guix package name for a given Texlive package NAME."
-  (string-append "texlive-" component "-"
+  (string-append "texlive-"
                  (string-map (match-lambda
                                (#\_ #\-)
                                (#\. #\-)
@@ -186,12 +188,123 @@ (define (sxml-value path)
                      ((lst ...) `(list ,@lst))
                      (license license)))))))
 
+(define tlpdb
+  (memoize
+   (lambda ()
+     (let ((file "/home/rekado/dev/gx/branches/master/texlive.tlpdb")
+           (fields
+            '((name     . string)
+              (shortdesc . string)
+              (longdesc . string)
+              (catalogue-license . string)
+              (catalogue-ctan . string)
+              (srcfiles . list)
+              (runfiles . list)
+              (docfiles . list)
+              (depend   . list)))
+           (record
+            (lambda* (key value alist #:optional (type 'string))
+              (let ((new
+                     (or (and=> (assoc-ref alist key)
+                                (lambda (existing)
+                                  (cond
+                                   ((eq? type 'string)
+                                    (string-append existing " " value))
+                                   ((eq? type 'list)
+                                    (cons value existing)))))
+                         (cond
+                          ((eq? type 'string)
+                           value)
+                          ((eq? type 'list)
+                           (list value))))))
+                (acons key new (alist-delete key alist))))))
+       (call-with-input-file file
+         (lambda (port)
+           (let loop ((all (list))
+                      (current (list))
+                      (last-property #false))
+             (let ((line (read-line port)))
+               (cond
+                ((eof-object? line) all)
+
+                ;; End of record.
+                ((string-null? line)
+                 (loop (cons (cons (assoc-ref current 'name) current)
+                             all)
+                       (list) #false))
+
+                ;; Continuation of a list
+                ((and (zero? (string-index line #\space)) last-property)
+                 ;; Erase optional second part of list values like
+                 ;; "details=Readme" for files
+                 (let ((plain-value (first
+                                     (string-split
+                                      (string-trim-both line) #\space))))
+                   (loop all (record last-property
+                                     plain-value
+                                     current
+                                     'list)
+                         last-property)))
+                (else
+                 (or (and-let* ((space (string-index line #\space))
+                                (key   (string->symbol (string-take line space)))
+                                (value (string-drop line (1+ space)))
+                                (field-type (assoc-ref fields key)))
+                       ;; Erase second part of list keys like "size=29"
+                       (if (eq? field-type 'list)
+                           (loop all current key)
+                           (loop all (record key value current field-type) key)))
+                     (loop all current #false))))))))))))
+
+(define (files->directories files)
+  (map (cut string-join <> "/" 'suffix)
+       (delete-duplicates (map (lambda (file)
+                                 (drop-right (string-split file #\/) 1))
+                               files)
+                          equal?)))
+
+(define (tlpdb->package name)
+  (and-let* ((data (assoc-ref (tlpdb) name))
+             (dirs (files->directories
+                    (append (or (assoc-ref data 'docfiles) (list))
+                            (or (assoc-ref data 'runfiles) (list))
+                            (or (assoc-ref data 'srcfiles) (list))))))
+    (pk data)
+    ;; TODO
+    `(package
+       (name ,(guix-name name))
+       (version (number->string %texlive-revision))
+       (source (texlive-origin name version
+                               ',dirs
+                               (base32
+                                "TODO"
+                                #;
+                                ,(bytevector->nix-base32-string
+                                  (let-values (((port get-hash) (open-sha256-port)))
+                                    (write-file checkout port)
+                                    (force-output port)
+                                    (get-hash))))))
+       (build-system texlive-build-system)
+       (arguments ,`(,'quote (#:tex-directory "TODO")))
+       ,@(or (and=> (assoc-ref data 'depend)
+                    (lambda (inputs)
+                      `((propagated-inputs ,inputs))))
+             '())
+       ,@(or (and=> (assoc-ref data 'catalogue-ctan)
+                    (lambda (url)
+                      `((home-page ,(string-append "https://ctan.org" url)))))
+             '((home-page "https://www.tug.org/texlive/")))
+       (synopsis ,(assoc-ref data 'shortdesc))
+       (description ,(beautify-description
+                      (assoc-ref data 'longdesc)))
+       (license ,(string->license
+                  (assoc-ref data 'catalogue-license))))))
+
 (define texlive->guix-package
   (memoize
    (lambda* (package-name #:optional (component "latex"))
      "Fetch the metadata for PACKAGE-NAME from REPO and return the `package'
 s-expression corresponding to that package, or #f on failure."
-     (and=> (fetch-sxml package-name)
-            (cut sxml->package <> component)))))
+     (tlpdb->package package-name))))
 
 ;;; ctan.scm ends here

  parent reply	other threads:[~2021-10-28 12:41 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-28  7:02 Accuracy of importers? Ludovic Courtès
2021-10-28  8:17 ` Lars-Dominik Braun
2021-10-28  8:54   ` Ludovic Courtès
2021-10-28 10:06     ` Lars-Dominik Braun
2021-10-29 21:57   ` Ludovic Courtès
2021-10-30 15:49     ` zimoun
2021-11-09 16:48       ` Ludovic Courtès
2021-11-09 18:36         ` zimoun
2021-10-28  9:06 ` zimoun
2021-10-28  9:30   ` zimoun
2021-10-28 11:38 ` Julien Lepiller
2021-10-28 12:25 ` Ricardo Wurmus [this message]
2021-10-28 14:47 ` Katherine Cox-Buday
2021-10-29 19:29 ` Nicolas Goaziou
2021-10-29 23:08   ` Carlo Zancanaro
2021-10-30 10:55 ` Xinglu Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y26dxqz9.fsf@elephly.net \
    --to=rekado@elephly.net \
    --cc=guix-devel@gnu.org \
    --cc=ludovic.courtes@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).