ludo@gnu.org (Ludovic Courtès) writes: > I would use ‘ghc-’ right from the start, since that’s really what it > is. WDYT? Agreed! >> Obviously the tests in part 5 were used along the way and will be >> removed. > > More precisely, they’ll have to be turned into tests/hackage.scm > (similar to tests/snix.scm and tests/pypi.scm.) :-) I've created a test for hackage. In doing so I've had an issue with the locale and would like an advice: When I run "guix import hackage package-name" from the shell, or when I invoke a REPL with Geiser everything works fine. When I initially did run the test-suite with make check TESTS=tests/hackage.scm the conditionals conversion looked wrong. I've found that this is related to the choice of locale. To make the test work as desired I added '(setlocale LC_ALL "en_US.UTF-8")' to the file declaring the tests. I suppose that running the 'guix import ...' command does get the locale from the shell. Should I somehow force the locale within the '(guix import hackage)' module? >> +;; List of libraries distributed with ghc (7.8.4). >> +(define ghc-standard-libraries >> + '("haskell98" ; 2.0.0.3 >> + "hoopl" ; 3.10.0.1 > > Maybe the version numbers in comments can be omitted since they’ll > become outdated eventually? OK. >> +(define guix-name-prefix "haskell-") > > s/guix-name-prefix/package-name-prefix/ and rather “ghc-” IMO. Agreed. > >> +;; Regular expression matching "key: value" >> +(define key-value-rx >> + "([a-zA-Z0-9-]+): *(\\w?.*)$") >> + >> +;; Regular expression matching a section "head sub-head ..." >> +(define sections-rx >> + "([a-zA-Z0-9\\(\\)-]+)") >> + >> +;; Cabal comment. >> +(define comment-rx >> + "^ *--") > > Use (make-regexp ...) directly, and then ‘regexp-exec’ instead of > ‘string-match’. I already had in mind to change these before sending the e-mail, but I forgot. It's now done. > >> +;; Check if the current line includes a key >> +(define (has-key? line) >> + (string-match key-value-rx line)) >> + >> +(define (comment-line? line) >> + (string-match comment-rx line)) >> + >> +;; returns the number of indentation spaces and the rest of the line. >> +(define (line-indentation+rest line) > > Please turn all the comments above procedures this into docstrings. Done. > >> +;; Part 1 main function: read a cabal fila and filter empty lines and comments. >> +;; Returns a list composed by the pre-processed lines of the file. >> +(define (read-cabal port) > > s/fila/file/ > s/Returns/Return/ > > I would expect ‘read-cabal’ to return a record, say, that can be > directly manipulated (just like ‘read’ returns a Scheme object.) But > here it seems to return an intermediate parsing result, right? OK. I've renamed some functions and the new name should hopefully be a better choice. What was 'read-cabal' has become 'strip-cabal' (which only strips comments and empty lines). >> +;; Parses a cabal file in the form of a list of lines as produced by >> +;; READ-CABAL and returns its content in the following form: >> +;; >> +;; (((head1 sub-head1 ... key1) (value)) >> +;; ((head2 sub-head2 ... key2) (value2)) >> +;; ...). >> +;; >> +;; where all elements are strings. >> +;; >> +;; We try do deduce the format from the following document: >> +;; https://www.haskell.org/cabal/users-guide/developing-packages.html >> +;; >> +;; Key values are case-insensitive. We therefore lowercase them. Values are >> +;; case-sensitive. >> +;; >> +;; Currently only only layout structured files are parsed. Braces {} > > “only indentation-structured files” OK > >> +;; structured files are not handled. >> +(define (cabal->key-values lines) > > I think this is the one I would call ‘read-cabal’. > ... and I've followed you suggestion and renamed this function to 'read-cabal'. >> +;; Find if a string represent a conditional >> +(define condition-rx >> + (make-regexp "^if +(.*)$")) > > The comment should rather be “Regexp for conditionals.” and be placed > below ‘define’. OK > I would expect the conversion of conditional expressions to sexps to be > done in the parsing phase above, such that ‘read-cabal’ returns an > object with some sort of an AST for those conditionals. > > Then this part would focus on the evaluation of those conditionals, > like: > > ;; Evaluate the conditionals in CABAL according to FLAGS. Return an > ;; evaluated Cabal object. > (eval-cabal cabal flags) > > WDYT? I think it's better to keep the parsing phase to a bare minimum and work with layers to keep each function as manageable and simple as possible. The way it works is as follows: 1. 'strip-cabal' reads the file, strips comment and empty lines and return a list. Each element of the list is a line from the Cabal file. 2. 'read-cabal' takes the above list and does the parsing required to read the indentation based structure of the file. It returns a list composed of list pairs. The pair is composed by a list of keys and a list of values. For example, the following snippet name: foo version: 1.0 ... is returned as ((("name") ("foo")) (("version") ("1.0")) ...) This is enough for all the information that we need to build a Guix package, but dependencies. Dependencies are listed in 'Library' and 'Executable cabal' sections of the Cabal file as in the following example snippet: executable cabal build-depends: HTTP >= 4000.2.5 && < 4000.3 ... and may include conditionals as in (continuing from above with the proper indentation) if os(windows) build-depends: Win32 >= 2 && < 3 else build-depends: unix >= 2.0 && < 2.8 ... Now, to make sense of the indentation based structure I need to keep a state indicating how many indentation levels we currently have and the name of the various sub-sections. For this reason I keep a one to one correspondence between indentation levels and section titles. That means that the above snipped is encoded as follows in my Cabal object: ((("executable cabal" "build-depends:") ("HTTP...")) (("executable cabal" "if os(windows)" "build-depends:") ("Win32 ...")) (("executable cabal" "else" "build-depends:") ("unix ...")) ...) If I split 'if' from the predicate 'os(windows)' then the level of indentation and the corresponding section header loose synchronization (different length). For this reason I only convert the predicates from Cabal syntax to Scheme syntax when I take the list of dependencies and convert the list in the form expected in a guix package. I hope to have clarified the strategy. >> +(define (guix-name name) > > Rather ‘hackage-name->package-name’? OK >> +;; Split the comma separated list of dependencies coming from the cabal file >> +;; and return a list with inputs suitable for the GUIX package. Currently the >> +;; version information is discarded. > > s/GUIX/Guix/ OK >> +(define (split-dependencies ls) >> + (define (split-at-comma d) >> + (map >> + (lambda (m) >> + (let ((name (guix-name (match:substring m 1)))) >> + (list name (list 'unquote (string->symbol name))))) >> + (list-matches dependencies-rx d))) > > I think it could use simply: > > (map string-trim-both > (string-tokenize d (char-set-complement (char-set #\,)))) Actually, this wouldn't do. On top of splitting at commas, the function also translates the name of the package from hackage-name to guix-name. >> +;; Genrate an S-expression containing the list of dependencies. The generated > > “Generate” Ouch... soo many typos :-( >> +;; S-expressions may include conditionals as defined in the cabal file. >> +;; During this process we discard the version information of the packages. >> +(define (dependencies-cond->sexp meta) > > [...] > >> + (match (match:substring rx-result 1) >> + ((? (cut member <> >> + ;; GUIX names are all lower-case. >> + (map (cut string-downcase <>) >> + ghc-standard-libraries))) > > s/GUIX/Guix/ > > I find it hard to read. Typically, I would introduce: > > (define (standard-library? name) > (member name ghc-standard-libraries)) > > and use it here (with the assumption that ‘ghc-standard-libraries’ is > already lowercase.) OK, I've followed your advice. I initially kept ghc-standard-libraries capitalized as the original name, but it doesn't really make any sense. >> +;; Run some tests >> + >> +;; (display (cabal->key-values >> +;; (call-with-input-file "mtl.cabal" read-cabal))) >> +;; (display (cabal->key-values >> +;; (call-with-input-file "/home/beffa/tmp/cabal-install.cabal" read-cabal))) >> +;; (display (get-flags (pre-process-entries-keys (cabal->key-values test-5)))) >> +;; (newline) >> +;; (display (conditional->sexp-like test-cond-2)) >> +;; (newline) >> +;; (display >> +;; (eval-flags (conditional->sexp-like test-cond-6) >> +;; (get-flags (pre-process-entries-keys (cabal->key-values test-6))))) >> +;; (newline) >> +;; (key->values (cabal->key-values test-1) "name") >> +;; (newline) >> +;; (key-start-end->entries (cabal->key-values test-4) "Library" "CC-Options") >> +;; (newline) >> +;; (eval-tests (conditional->sexp-like test-cond-6)) > > This should definitely go to tests/hackage.scm. As mentioned above, I've created a test-suite for the hackage and added a couple of thests. > >> + (display (_ "Usage: guix import hackage PACKAGE-NAME >> +Import and convert the Hackage package for PACKAGE-NAME. If PACKAGE-NAME >> +includes a suffix constituted by a dash followed by a numerical version (as >> +used with GUIX packages), then a definition for the specified version of the > > s/GUIX/Guix/ OK > Also, guix/scripts/import.scm must be added to po/guix/POTFILES.in, for > i18n. OK, I've added it. Thanks for the review! Fede