unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
From: Federico Beffa <beffa@ieee.org>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: 21829@debbugs.gnu.org
Subject: bug#21829: guix import hackage failures
Date: Sat, 14 Nov 2015 15:37:35 +0100	[thread overview]
Message-ID: <CAKrPhPMQ7fFTURcz83XZkZ2ho4OBdQ7PHWb4EFwySOUQobkhUg@mail.gmail.com> (raw)
In-Reply-To: <87h9kp1ts2.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 3107 bytes --]

On Fri, Nov 13, 2015 at 10:19 PM, Ludovic Courtès <ludo@gnu.org> wrote:
> Federico Beffa <beffa@ieee.org> skribis:
>
>> ---------------------------------------------------------------
>> (define (canonical-newline-port port)
>>   "Return an input port that wraps PORT such that all newlines consist
>>   of a single carriage return."
>>   (define (get-position)
>>     (if (port-has-port-position? port) (port-position port) #f))
>>   (define (set-position! position)
>>     (if (port-has-set-port-position!? port)
>>         (set-port-position! position port)
>>         #f))
>>   (define (close) (close-port port))
>>   (define (read! bv start n)
>>     (let loop ((count 0)
>>                (byte (get-u8 port)))
>>       (cond ((or (eof-object? byte) (= count n)) count)
>
> BYTE is lost here in the case it is not EOF.

Ooops. Thanks for catching it!

> It may be best to move the (= count n) case right before the recursive
> call below.
>
>>             ((eqv? byte (char->integer #\return)) (loop count (get-u8 port)))
>
> In practice this discards LF even if it’s not following CR; that’s
> probably a good enough approximation, but an XXX comment would be
> welcome.

This is intentional because, in my ignorance, I only know of uses of
'\r' before or after '\n'. Do you know of any other use in text files?

I've added an "XXX" comment, but I'm not sure what's its use.

>
>>             (else
>>              (bytevector-u8-set! bv (+ start count) byte)
>>              (loop (+ count 1) (get-u8 port))))))
>>   (make-custom-binary-input-port "canonical-newline-port"
>>                                  read!
>>                                  get-position
>>                                  set-position!
>>                                  close))
>> ---------------------------------------------------------------
>>
>> IMO this is general enough that it could go into "guix/utils.scm". Are
>> you OK with this?
>
> Looks good!  Could you make a patch that does that, along with adding a
> test or two in tests/utils.scm?

The attached patches fix the parsing of all but two of the failures
reported by Paul.
Two cabal files are still not imported correctly because they are buggy:

* streaming-commons: indentation changes from 4 to 2. But this is
explicitly forbidden. From [1]: "Field names may be indented, but all
field values in the same section must use the same indentation."

* fgl: uses braces to delimit the value of a field. As far as I
understand this is not allowed by [1]: "To continue a field value,
indent the next line relative to the field name." and "Flags,
conditionals, library and executable sections use layout to indicate
structure. ... As an alternative to using layout you can also use
explicit braces {}. ". Thus I understand that braces may be used to
delimit sections, not field values.

Obviously the official 'cabal' program is more permissive than the
description in the documentation.

Regards,
Fede

[1] https://www.haskell.org/cabal/users-guide/developing-packages.html

[-- Attachment #2: 0001-import-hackage-Add-recognition-of-true-and-false-sym.patch --]
[-- Type: text/x-diff, Size: 3028 bytes --]

From d13f06383d07e0ad4096ff7eb715264463738b0c Mon Sep 17 00:00:00 2001
From: Federico Beffa <beffa@fbengineering.ch>
Date: Wed, 11 Nov 2015 10:39:38 +0100
Subject: [PATCH 1/6] import: hackage: Add recognition of 'true' and 'false'
 symbols.

* guix/import/cabal.scm (is-true, is-false, lex-true, lex-false): New procedures.
  (lex-word): Use them.
  (make-cabal-parser): Add TRUE and FALSE tokens.
  (eval): Add entries for 'true and 'false symbols.
---
 guix/import/cabal.scm | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/guix/import/cabal.scm b/guix/import/cabal.scm
index 45d644a..8d84e09 100644
--- a/guix/import/cabal.scm
+++ b/guix/import/cabal.scm
@@ -138,7 +138,7 @@ to the stack."
   "Generate a parser for Cabal files."
   (lalr-parser
    ;; --- token definitions
-   (CCURLY VCCURLY OPAREN CPAREN TEST ID VERSION RELATION
+   (CCURLY VCCURLY OPAREN CPAREN TEST ID VERSION RELATION TRUE FALSE
            (right: IF FLAG EXEC TEST-SUITE SOURCE-REPO BENCHMARK LIB OCURLY)
            (left: OR)
            (left: PROPERTY AND)
@@ -206,6 +206,8 @@ to the stack."
    (if-then     (IF tests OCURLY exprs CCURLY) : `(if ,$2 ,$4 ())
                 (IF tests open exprs close)    : `(if ,$2 ,$4 ()))
    (tests       (TEST OPAREN ID CPAREN)        : `(,$1 ,$3)
+                (TRUE)                         : 'true
+                (FALSE)                        : 'false
                 (TEST OPAREN ID RELATION VERSION CPAREN)
                 : `(,$1 ,(string-append $3 " " $4 " " $5))
                 (TEST OPAREN ID RELATION VERSION AND RELATION VERSION CPAREN)
@@ -350,6 +352,10 @@ matching a string against the created regexp."
 
 (define (is-if s) (string-ci=? s "if"))
 
+(define (is-true s) (string-ci=? s "true"))
+
+(define (is-false s) (string-ci=? s "false"))
+
 (define (is-and s) (string=? s "&&"))
 
 (define (is-or s) (string=? s "||"))
@@ -424,6 +430,10 @@ string with the read characters."
 
 (define (lex-if loc) (make-lexical-token 'IF loc #f))
 
+(define (lex-true loc) (make-lexical-token 'TRUE loc #t))
+
+(define (lex-false loc) (make-lexical-token 'FALSE loc #f))
+
 (define (lex-and loc) (make-lexical-token 'AND loc #f))
 
 (define (lex-or loc) (make-lexical-token 'OR loc #f))
@@ -489,6 +499,8 @@ LOC is the current port location."
   (let* ((w (read-delimited " ()\t\n" port 'peek)))
     (cond ((is-if w) (lex-if loc))
           ((is-test w port) (lex-test w loc))
+          ((is-true w) (lex-true loc))
+          ((is-false w) (lex-false loc))
           ((is-and w) (lex-and loc))
           ((is-or w) (lex-or loc))
           ((is-id w) (lex-id w loc))
@@ -714,6 +726,8 @@ the ordering operation and the version."
       (('os name) (os name))
       (('arch name) (arch name))
       (('impl name) (impl name))
+      ('true #t)
+      ('false #f)
       (('not name) (not (eval name)))
       ;; 'and' and 'or' aren't functions, thus we can't use apply
       (('and args ...) (fold (lambda (e s) (and e s)) #t (eval args)))
-- 
2.4.3


[-- Attachment #3: 0002-import-hackage-Imporve-parsing-of-tests.patch --]
[-- Type: text/x-diff, Size: 1941 bytes --]

From 445f1b6197c0e266027ac033c52629d990137171 Mon Sep 17 00:00:00 2001
From: Federico Beffa <beffa@fbengineering.ch>
Date: Wed, 11 Nov 2015 11:22:42 +0100
Subject: [PATCH 2/6] import: hackage: Imporve parsing of tests.

* guix/import/cabal.scm (lex-word): Add support for tests with no spaces.
  (impl): Fix handling of operator "==".
---
 guix/import/cabal.scm | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/guix/import/cabal.scm b/guix/import/cabal.scm
index 8d84e09..615eca2 100644
--- a/guix/import/cabal.scm
+++ b/guix/import/cabal.scm
@@ -496,7 +496,7 @@ location."
 (define (lex-word port loc)
   "Process tokens which can be recognized by reading the next word form PORT.
 LOC is the current port location."
-  (let* ((w (read-delimited " ()\t\n" port 'peek)))
+  (let* ((w (read-delimited " <>=()\t\n" port 'peek)))
     (cond ((is-if w) (lex-if loc))
           ((is-test w port) (lex-test w loc))
           ((is-true w) (lex-true loc))
@@ -688,7 +688,11 @@ the ordering operation and the version."
                             (cut match:substring <> 2)))
            (version (and=> (with-ver-matcher-fn spec)
                            (cut match:substring <> 3))))
-      (values name operator version)))
+      (values name
+              (if (and (string? operator) (string= operator "=="))
+                  "="
+                  operator)
+              version)))
   
   (define (impl haskell)
     (let*-values (((comp-name comp-ver)
@@ -697,7 +701,7 @@ the ordering operation and the version."
                    (comp-spec-name+op+version haskell)))
       (if (and spec-ver comp-ver)
           (eval-string
-           (string-append "(string" spec-op " \"" comp-name "\""
+           (string-append "(string" spec-op " \"" comp-name "-" comp-ver "\""
                           " \"" spec-name "-" spec-ver "\")"))
           (string-match spec-name comp-name))))
   
-- 
2.4.3


[-- Attachment #4: 0003-import-hackage-Make-it-resilient-to-missing-final-ne.patch --]
[-- Type: text/x-diff, Size: 2583 bytes --]

From f796d814821289a98e401a3e3df13334a2e8689b Mon Sep 17 00:00:00 2001
From: Federico Beffa <beffa@fbengineering.ch>
Date: Wed, 11 Nov 2015 15:31:46 +0100
Subject: [PATCH 3/6] import: hackage: Make it resilient to missing final
 newline.

* guix/import/cabal.scm (peek-next-line-indent): Check for missing final
  newline.
---
 guix/import/cabal.scm | 31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/guix/import/cabal.scm b/guix/import/cabal.scm
index 615eca2..ca88ff5 100644
--- a/guix/import/cabal.scm
+++ b/guix/import/cabal.scm
@@ -226,19 +226,24 @@ to the stack."
   "This function can be called when the next character on PORT is #\newline
 and returns the indentation of the line starting after the #\newline
 character.  Discard (and consume) empty and comment lines."
-  (let ((initial-newline (string (read-char port))))
-    (let loop ((char (peek-char port))
-               (word ""))
-      (cond ((eqv? char #\newline) (read-char port)
-             (loop (peek-char port) ""))
-            ((or (eqv? char #\space) (eqv? char #\tab))
-             (let ((c (read-char port)))
-               (loop (peek-char port) (string-append word (string c)))))
-            ((comment-line port char) (loop (peek-char port) ""))
-            (else
-             (let ((len (string-length word)))
-               (unread-string (string-append initial-newline word) port)
-               len))))))
+  (if (eof-object? (peek-char port))
+      ;; If the file is missing the #\newline on the last line, add it and act
+      ;; as if it were there. This is needed for propoer operation of
+      ;; indentation based block recognition.
+      (begin (unread-char #\newline port) (read-char port) 0)
+      (let ((initial-newline (string (read-char port))))
+        (let loop ((char (peek-char port))
+                   (word ""))
+          (cond ((eqv? char #\newline) (read-char port)
+                 (loop (peek-char port) ""))
+                ((or (eqv? char #\space) (eqv? char #\tab))
+                 (let ((c (read-char port)))
+                   (loop (peek-char port) (string-append word (string c)))))
+                ((comment-line port char) (loop (peek-char port) ""))
+                (else
+                 (let ((len (string-length word)))
+                   (unread-string (string-append initial-newline word) port)
+                   len)))))))
 
 (define* (read-value port value min-indent #:optional (separator " "))
   "The next character on PORT must be #\newline.  Append to VALUE the
-- 
2.4.3


[-- Attachment #5: 0004-import-hackage-Make-parsing-of-tests-and-fields-more.patch --]
[-- Type: text/x-diff, Size: 2526 bytes --]

From 225164d2355afd6f9455251d87cbd34b08f68cdb Mon Sep 17 00:00:00 2001
From: Federico Beffa <beffa@fbengineering.ch>
Date: Wed, 11 Nov 2015 16:20:45 +0100
Subject: [PATCH 4/6] import: hackage: Make parsing of tests and fields more
 flexible.

* guix/import/cabal.scm (is-test): Allow spaces between keyword and
  parentheses.
  (is-id): Add argument 'port'.  Allow spaces between keyword and column.
  (lex-word): Adjust call to 'is-id'.
---
 guix/import/cabal.scm | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/guix/import/cabal.scm b/guix/import/cabal.scm
index ca88ff5..257afa5 100644
--- a/guix/import/cabal.scm
+++ b/guix/import/cabal.scm
@@ -332,7 +332,7 @@ matching a string against the created regexp."
                 (make-regexp pat))))
     (cut regexp-exec rx <>)))
 
-(define is-property (make-rx-matcher "([a-z0-9-]+):[ \t]*(\\w?.*)$"
+(define is-property (make-rx-matcher "([a-z0-9-]+)[ \t]*:[ \t]*(\\w?.*)$"
                                      regexp/icase))
 
 (define is-flag (make-rx-matcher "^flag +([a-z0-9_-]+)"
@@ -365,17 +365,24 @@ matching a string against the created regexp."
 
 (define (is-or s) (string=? s "||"))
 
-(define (is-id s)
+(define (is-id s port)
   (let ((cabal-reserved-words
          '("if" "else" "library" "flag" "executable" "test-suite"
-           "source-repository" "benchmark")))
+           "source-repository" "benchmark"))
+        (spaces (read-while (cut char-set-contains? char-set:blank <>) port))
+        (c (peek-char port)))
+    (unread-string spaces port)
     (and (every (cut string-ci<> s <>) cabal-reserved-words)
-         (not (char=? (last (string->list s)) #\:)))))
+         (and (not (char=? (last (string->list s)) #\:))
+              (not (char=? #\: c))))))
 
 (define (is-test s port)
   (let ((tests-rx (make-regexp "os|arch|flag|impl"))
+        (spaces (read-while (cut char-set-contains? char-set:blank <>) port))
         (c (peek-char port)))
-    (and (regexp-exec tests-rx s) (char=? #\( c))))
+    (if (and (regexp-exec tests-rx s) (char=? #\( c))
+        #t
+        (begin (unread-string spaces port) #f))))
 
 ;; Lexers for individual tokens.
 
@@ -508,7 +515,7 @@ LOC is the current port location."
           ((is-false w) (lex-false loc))
           ((is-and w) (lex-and loc))
           ((is-or w) (lex-or loc))
-          ((is-id w) (lex-id w loc))
+          ((is-id w port) (lex-id w loc))
           (else (unread-string w port) #f))))
 
 (define (lex-line port loc)
-- 
2.4.3


[-- Attachment #6: 0005-utils-Add-canonical-newline-port.patch --]
[-- Type: text/x-diff, Size: 3155 bytes --]

From 1b26410f4a7a920382750bffbf5381394acafdbc Mon Sep 17 00:00:00 2001
From: Federico Beffa <beffa@fbengineering.ch>
Date: Sat, 14 Nov 2015 15:00:36 +0100
Subject: [PATCH 5/6] utils: Add 'canonical-newline-port'.

* guix/utils.scm (canonical-newline-port): New procedure.
* tests/utils.scm ("canonical-newline-port"): New test.
---
 guix/utils.scm  | 34 ++++++++++++++++++++++++++++++++--
 tests/utils.scm |  6 ++++++
 2 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/guix/utils.scm b/guix/utils.scm
index 1542e86..7b589e6 100644
--- a/guix/utils.scm
+++ b/guix/utils.scm
@@ -29,7 +29,8 @@
   #:use-module (srfi srfi-39)
   #:use-module (srfi srfi-60)
   #:use-module (rnrs bytevectors)
-  #:use-module ((rnrs io ports) #:select (put-bytevector))
+  #:use-module (rnrs io ports)
+  #:use-module ((rnrs bytevectors) #:select (bytevector-u8-set!))
   #:use-module ((guix build utils)
                 #:select (dump-port package-name->name+version))
   #:use-module ((guix build syscalls) #:select (errno mkdtemp!))
@@ -90,7 +91,8 @@
             decompressed-port
             call-with-decompressed-port
             compressed-output-port
-            call-with-compressed-output-port))
+            call-with-compressed-output-port
+            canonical-newline-port))
 
 \f
 ;;;
@@ -746,6 +748,34 @@ elements after E."
             (if success?
                 (loop (absolute target) (+ depth 1))
                 file))))))
+
+(define (canonical-newline-port port)
+  "Return an input port that wraps PORT such that all newlines consist
+  of a single carriage return."
+  (define (get-position)
+    (if (port-has-port-position? port) (port-position port) #f))
+  (define (set-position! position)
+    (if (port-has-set-port-position!? port)
+        (set-port-position! position port)
+        #f))
+  (define (close) (close-port port))
+  (define (read! bv start n)
+    (let loop ((count 0)
+               (byte (get-u8 port)))
+      (cond ((eof-object? byte) count)
+            ((= count (- n 1))
+             (bytevector-u8-set! bv (+ start count) byte)
+             n)
+            ;; XXX: consume all LFs even if not followed by CR.
+            ((eqv? byte (char->integer #\return)) (loop count (get-u8 port)))
+            (else
+             (bytevector-u8-set! bv (+ start count) byte)
+             (loop (+ count 1) (get-u8 port))))))
+  (make-custom-binary-input-port "canonical-newline-port"
+                                 read!
+                                 get-position
+                                 set-position!
+                                 close))
 \f
 ;;;
 ;;; Source location.
diff --git a/tests/utils.scm b/tests/utils.scm
index b65d6d2..9d345e0 100644
--- a/tests/utils.scm
+++ b/tests/utils.scm
@@ -318,6 +318,12 @@
    (string-append (%store-prefix)
                   "/qvs2rj2ia5vci3wsdb7qvydrmacig4pg-bash-4.2-p24")))
 
+(test-equal "canonical-newline-port"
+  "This is a journey"
+  (let ((port (open-string-input-port
+               "This is a journey\r\n")))
+    (get-line (canonical-newline-port port))))
+
 (test-end)
 
 (false-if-exception (delete-file temp-file))
-- 
2.4.3


[-- Attachment #7: 0006-import-hackage-Handle-CRLF-end-of-line-style.patch --]
[-- Type: text/x-diff, Size: 1856 bytes --]

From c57be8cae9b3642beff1462acd32a0aee54ad7c6 Mon Sep 17 00:00:00 2001
From: Federico Beffa <beffa@fbengineering.ch>
Date: Sat, 14 Nov 2015 15:15:00 +0100
Subject: [PATCH 6/6] import: hackage: Handle CRLF end of line style.

* guix/import/hackage.scm (hackage-fetch, hackage->guix-package): Do it.
---
 guix/import/hackage.scm | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/guix/import/hackage.scm b/guix/import/hackage.scm
index 3baa514..8725ffa 100644
--- a/guix/import/hackage.scm
+++ b/guix/import/hackage.scm
@@ -22,7 +22,8 @@
   #:use-module (srfi srfi-11)
   #:use-module (srfi srfi-1)
   #:use-module ((guix download) #:select (download-to-store))
-  #:use-module ((guix utils) #:select (package-name->name+version))
+  #:use-module ((guix utils) #:select (package-name->name+version
+                                       canonical-newline-port))
   #:use-module (guix import utils)
   #:use-module (guix import cabal)
   #:use-module (guix store)
@@ -84,7 +85,8 @@ version."
     (call-with-temporary-output-file
      (lambda (temp port)
        (and (url-fetch url temp)
-            (call-with-input-file temp read-cabal))))))
+            (call-with-input-file temp
+              (compose read-cabal canonical-newline-port)))))))
 
 (define string->license
   ;; List of valid values from
@@ -216,7 +218,7 @@ to the Cabal file format definition.  The default value associated with the
 keys \"os\", \"arch\" and \"impl\" is \"linux\", \"x86_64\" and \"ghc\"
 respectively."
   (let ((cabal-meta (if port
-                        (read-cabal port)
+                        (read-cabal (canonical-newline-port port))
                         (hackage-fetch package-name))))
     (and=> cabal-meta (compose (cut hackage-module->sexp <>
                                     #:include-test-dependencies? 
-- 
2.4.3


  reply	other threads:[~2015-11-14 14:38 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-04 15:00 bug#21829: guix import hackage failures Paul van der Walt
2015-11-04 23:11 ` Ludovic Courtès
2015-11-10 16:40 ` Federico Beffa
2015-11-11 11:18   ` Ludovic Courtès
2015-11-11 21:29     ` Federico Beffa
2015-11-12  9:07       ` Ludovic Courtès
2015-11-12 16:54         ` Federico Beffa
2015-11-12 20:21           ` Ludovic Courtès
2015-11-13 17:08             ` Federico Beffa
2015-11-13 21:19               ` Ludovic Courtès
2015-11-14 14:37                 ` Federico Beffa [this message]
2015-11-15 20:59                   ` Ludovic Courtès
2015-11-25 16:55                     ` Federico Beffa
2015-11-25 21:45                       ` Ludovic Courtès
2015-11-26  8:28                         ` Federico Beffa
2015-11-26  8:46                           ` Ludovic Courtès
2015-11-26 17:23                         ` Federico Beffa
2015-11-26 19:56                           ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKrPhPMQ7fFTURcz83XZkZ2ho4OBdQ7PHWb4EFwySOUQobkhUg@mail.gmail.com \
    --to=beffa@ieee.org \
    --cc=21829@debbugs.gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).