From: Ricardo Wurmus <rekado@elephly.net>
To: "Ludovic Courtès" <ludovic.courtes@inria.fr>
Cc: guix-devel@gnu.org
Subject: Re: Accuracy of importers?
Date: Thu, 28 Oct 2021 12:25:56 +0000 [thread overview]
Message-ID: <87y26dxqz9.fsf@elephly.net> (raw)
In-Reply-To: <878ryd8we4.fsf@inria.fr>
[-- Attachment #1: Type: text/plain, Size: 3001 bytes --]
Ludovic Courtès <ludovic.courtes@inria.fr> writes:
> Hello Guix!
>
> As I’m preparing my PackagingCon talk and wondering how language
> package
> managers could make our lives easier, I thought it’d be
> interesting to
> know how well our importers are doing.
>
> My understanding is that most of them require manual
> intervention—i.e.,
> one has to tweak what ‘guix import’ produces, even if we ignore
> synopsis/description/license, to set the right inputs, etc. If
> we were
> to estimate the fraction of imported packages for which manual
> changes
> are needed, what would it look like?
>
> importer fraction of imported packages needing changes
[…]
> cran 5% (Ricardo? Simon? seems to almost always
> work?)
Like Lars and Simon wrote: the importers work *really* well for
both CRAN and Bioconductor, so much so that I’m using them in the
background here:
https://git.elephly.net/gitweb.cgi?p=software/r-guix-install.git;a=blob;f=guix-install.R;h=2766aa1f2d248a8ed2a4eb4c3244b85574d326e2;hb=HEAD
The biggest annoyance is the missing “license:” prefix when
packaging things for gnu/packages/cran.scm or
gnu/packages/bioconductor.scm. Descriptions need regular clean-
up work (e.g. to complete sentences), even though we’re using some
heuristics to fix the most common stylistic problems. It’s really
not a big deal, though.
The biggest missing feature is recursive import of dependencies
hosted on Github or Mercurial (with “-r -a git” or “-r -a hg”).
I.e. a package on Github that declares a dependency on another
package that’s also only hosted on Github will fail to import that
dependency. This is pretty rare, but it happens with experimental
bioinfo software.
> texlive (Ricardo? Thiago? Marius?)
This one is not usable. I’d even add “at all”. I keep announcing
that one day I’ll replace it with a new importer, but that new
importer just isn’t ready yet.
> What about licensing info: which ones provide accurate licensing
> info?
> My guess:
>
> gnu
> pypi
> cpan
> cran
The CRAN importer is as accurate as upstream allows. CRAN
requires a free license, Bioconductor requires a license
declaration (there have been very few cases where the license was
not correct, but a number of cases where the license was non-free,
such as the Artistic 1.0 license. Bioconductor sometimes is
sneaky and the R code is free but a necessary library is not.
> texlive
Pretty terrible. The license declaration is generally too vague.
Licenses are often declared without version number, and sometimes
it’s just some generic “free” license. A new importer based on
texlive.tlpdb would not improve this by much, because the upstream
declarations are just spotty and unreliable.
--
Ricardo
PS: attached is a rough WIP patch of what I had been using to
import new texlive stuff.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: texlive-import.diff --]
[-- Type: text/x-patch, Size: 6523 bytes --]
diff --git a/guix/import/texlive.scm b/guix/import/texlive.scm
index 18d8b95ee0..b94aa1cf40 100644
--- a/guix/import/texlive.scm
+++ b/guix/import/texlive.scm
@@ -19,10 +19,12 @@
(define-module (guix import texlive)
#:use-module (ice-9 match)
+ #:use-module (ice-9 rdelim)
#:use-module (sxml simple)
#:use-module (sxml xpath)
#:use-module (srfi srfi-11)
#:use-module (srfi srfi-1)
+ #:use-module (srfi srfi-2)
#:use-module (srfi srfi-26)
#:use-module (srfi srfi-34)
#:use-module (web uri)
@@ -125,9 +127,9 @@ (define (fetch-sxml name)
(xml->sxml (http-fetch url)
#:trim-whitespace? #t))))
-(define (guix-name component name)
+(define (guix-name name)
"Return a Guix package name for a given Texlive package NAME."
- (string-append "texlive-" component "-"
+ (string-append "texlive-"
(string-map (match-lambda
(#\_ #\-)
(#\. #\-)
@@ -186,12 +188,123 @@ (define (sxml-value path)
((lst ...) `(list ,@lst))
(license license)))))))
+(define tlpdb
+ (memoize
+ (lambda ()
+ (let ((file "/home/rekado/dev/gx/branches/master/texlive.tlpdb")
+ (fields
+ '((name . string)
+ (shortdesc . string)
+ (longdesc . string)
+ (catalogue-license . string)
+ (catalogue-ctan . string)
+ (srcfiles . list)
+ (runfiles . list)
+ (docfiles . list)
+ (depend . list)))
+ (record
+ (lambda* (key value alist #:optional (type 'string))
+ (let ((new
+ (or (and=> (assoc-ref alist key)
+ (lambda (existing)
+ (cond
+ ((eq? type 'string)
+ (string-append existing " " value))
+ ((eq? type 'list)
+ (cons value existing)))))
+ (cond
+ ((eq? type 'string)
+ value)
+ ((eq? type 'list)
+ (list value))))))
+ (acons key new (alist-delete key alist))))))
+ (call-with-input-file file
+ (lambda (port)
+ (let loop ((all (list))
+ (current (list))
+ (last-property #false))
+ (let ((line (read-line port)))
+ (cond
+ ((eof-object? line) all)
+
+ ;; End of record.
+ ((string-null? line)
+ (loop (cons (cons (assoc-ref current 'name) current)
+ all)
+ (list) #false))
+
+ ;; Continuation of a list
+ ((and (zero? (string-index line #\space)) last-property)
+ ;; Erase optional second part of list values like
+ ;; "details=Readme" for files
+ (let ((plain-value (first
+ (string-split
+ (string-trim-both line) #\space))))
+ (loop all (record last-property
+ plain-value
+ current
+ 'list)
+ last-property)))
+ (else
+ (or (and-let* ((space (string-index line #\space))
+ (key (string->symbol (string-take line space)))
+ (value (string-drop line (1+ space)))
+ (field-type (assoc-ref fields key)))
+ ;; Erase second part of list keys like "size=29"
+ (if (eq? field-type 'list)
+ (loop all current key)
+ (loop all (record key value current field-type) key)))
+ (loop all current #false))))))))))))
+
+(define (files->directories files)
+ (map (cut string-join <> "/" 'suffix)
+ (delete-duplicates (map (lambda (file)
+ (drop-right (string-split file #\/) 1))
+ files)
+ equal?)))
+
+(define (tlpdb->package name)
+ (and-let* ((data (assoc-ref (tlpdb) name))
+ (dirs (files->directories
+ (append (or (assoc-ref data 'docfiles) (list))
+ (or (assoc-ref data 'runfiles) (list))
+ (or (assoc-ref data 'srcfiles) (list))))))
+ (pk data)
+ ;; TODO
+ `(package
+ (name ,(guix-name name))
+ (version (number->string %texlive-revision))
+ (source (texlive-origin name version
+ ',dirs
+ (base32
+ "TODO"
+ #;
+ ,(bytevector->nix-base32-string
+ (let-values (((port get-hash) (open-sha256-port)))
+ (write-file checkout port)
+ (force-output port)
+ (get-hash))))))
+ (build-system texlive-build-system)
+ (arguments ,`(,'quote (#:tex-directory "TODO")))
+ ,@(or (and=> (assoc-ref data 'depend)
+ (lambda (inputs)
+ `((propagated-inputs ,inputs))))
+ '())
+ ,@(or (and=> (assoc-ref data 'catalogue-ctan)
+ (lambda (url)
+ `((home-page ,(string-append "https://ctan.org" url)))))
+ '((home-page "https://www.tug.org/texlive/")))
+ (synopsis ,(assoc-ref data 'shortdesc))
+ (description ,(beautify-description
+ (assoc-ref data 'longdesc)))
+ (license ,(string->license
+ (assoc-ref data 'catalogue-license))))))
+
(define texlive->guix-package
(memoize
(lambda* (package-name #:optional (component "latex"))
"Fetch the metadata for PACKAGE-NAME from REPO and return the `package'
s-expression corresponding to that package, or #f on failure."
- (and=> (fetch-sxml package-name)
- (cut sxml->package <> component)))))
+ (tlpdb->package package-name))))
;;; ctan.scm ends here
next prev parent reply other threads:[~2021-10-28 12:41 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-28 7:02 Accuracy of importers? Ludovic Courtès
2021-10-28 8:17 ` Lars-Dominik Braun
2021-10-28 8:54 ` Ludovic Courtès
2021-10-28 10:06 ` Lars-Dominik Braun
2021-10-29 21:57 ` Ludovic Courtès
2021-10-30 15:49 ` zimoun
2021-11-09 16:48 ` Ludovic Courtès
2021-11-09 18:36 ` zimoun
2021-10-28 9:06 ` zimoun
2021-10-28 9:30 ` zimoun
2021-10-28 11:38 ` Julien Lepiller
2021-10-28 12:25 ` Ricardo Wurmus [this message]
2021-10-28 14:47 ` Katherine Cox-Buday
2021-10-29 19:29 ` Nicolas Goaziou
2021-10-29 23:08 ` Carlo Zancanaro
2021-10-30 10:55 ` Xinglu Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87y26dxqz9.fsf@elephly.net \
--to=rekado@elephly.net \
--cc=guix-devel@gnu.org \
--cc=ludovic.courtes@inria.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).