From mboxrd@z Thu Jan 1 00:00:00 1970 From: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) Subject: Re: hackage importer Date: Thu, 26 Mar 2015 14:09:55 +0100 Message-ID: <871tkbdhwc.fsf@gnu.org> References: <87k2yiqqaw.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:35366) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yb7Xu-0002Ep-IE for guix-devel@gnu.org; Thu, 26 Mar 2015 09:10:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Yb7Xq-0002XX-G9 for guix-devel@gnu.org; Thu, 26 Mar 2015 09:10:02 -0400 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:44618) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yb7Xq-0002XB-Ch for guix-devel@gnu.org; Thu, 26 Mar 2015 09:09:58 -0400 In-Reply-To: (Federico Beffa's message of "Sun, 22 Mar 2015 21:12:44 +0100") List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org To: Federico Beffa Cc: Guix-devel Federico Beffa skribis: > I've created a test for hackage. In doing so I've had an issue with the > locale and would like an advice: When I run "guix import hackage > package-name" from the shell, or when I invoke a REPL with Geiser > everything works fine. When I initially did run the test-suite with > > make check TESTS=3Dtests/hackage.scm > > the conditionals conversion looked wrong. I've found that this is > related to the choice of locale. To make the test work as desired I > added '(setlocale LC_ALL "en_US.UTF-8")' to the file declaring the > tests. What=E2=80=99s the exact error you were getting? Character classes in rege= xps are locale sensitive, I think, which could be one reason things behave differently. There=E2=80=99s also the problem that, when running in the C locale, =E2=80=98regexp-exec=E2=80=99 (and other C-implemented procedures) = cannot be passed any string that is not strictly ASCII. Could you post the actual backtrace you get (?) when running the program with LC_ALL=3DC? [...] >> I would expect the conversion of conditional expressions to sexps to be >> done in the parsing phase above, such that =E2=80=98read-cabal=E2=80=99 = returns an >> object with some sort of an AST for those conditionals. >> >> Then this part would focus on the evaluation of those conditionals, >> like: >> >> ;; Evaluate the conditionals in CABAL according to FLAGS. Return an >> ;; evaluated Cabal object. >> (eval-cabal cabal flags) >> >> WDYT? > > I think it's better to keep the parsing phase to a bare minimum and work > with layers to keep each function as manageable and simple as possible. > > The way it works is as follows: > > 1. 'strip-cabal' reads the file, strips comment and empty lines and > return a list. Each element of the list is a line from the Cabal file. > > 2. 'read-cabal' takes the above list and does the parsing required to > read the indentation based structure of the file. It returns a list > composed of list pairs. The pair is composed by a list of keys and a > list of values. For example, the following snippet > > name: foo > version: 1.0 > ... > > is returned as > > ((("name") ("foo")) > (("version") ("1.0")) > ...) OK. I would rather have =E2=80=98read-cabal=E2=80=99 take an input port (l= ike Scheme=E2=80=99s =E2=80=98read=E2=80=99) and return the list above; this would be the least = surprising, more idiomatic approach. =E2=80=98strip-cabal=E2=80=99 (or =E2=80=98strip-insignificant-lines=E2=80=99?) would be an internal procedur= e used only by =E2=80=98read-cabal=E2=80=99. WDYT? > This is enough for all the information that we need to build a Guix > package, but dependencies. > > Dependencies are listed in 'Library' and 'Executable cabal' sections of > the Cabal file as in the following example snippet: > > executable cabal > build-depends: > HTTP >=3D 4000.2.5 && < 4000.3 > ... > > and may include conditionals as in (continuing from above with the > proper indentation) > > if os(windows) > build-depends: Win32 >=3D 2 && < 3 > else > build-depends: unix >=3D 2.0 && < 2.8 > ... > > Now, to make sense of the indentation based structure I need to keep a > state indicating how many indentation levels we currently have and the > name of the various sub-sections. For this reason I keep a one to one > correspondence between indentation levels and section titles. That means > that the above snipped is encoded as follows in my Cabal object: > > ((("executable cabal" "build-depends:") ("HTTP...")) > (("executable cabal" "if os(windows)" "build-depends:") ("Win32 ...")) > (("executable cabal" "else" "build-depends:") ("unix ...")) > ...) > > If I split 'if' from the predicate 'os(windows)' then the level of > indentation and the corresponding section header loose synchronization > (different length). Would it be possible for =E2=80=98read-cabal=E2=80=99 to instead return a tree-structured sexp like: (if (os windows) (build-depends (Win32 >=3D 2 && < 3)) (build-depends (unix >=3D 2.0 && < 2.8))) That would use a variant of =E2=80=98conditional->sexp-like=E2=80=99, essen= tially. (Of course the achieve that the parser must keep track of indentation levels and everything, as you already implemented; I=E2=80=99m just comment= ing on the interface here.) Then, if I could imagine: (eval-cabal '(("name" "foo") ("version" "1.0" ("executable cabal" (if (os windows) ...))) =3D> # This way the structure of the Cabal file would be preserved, only converted to sexp form, which is easier to work with. Does that make sense? > For this reason I only convert the predicates from Cabal syntax to > Scheme syntax when I take the list of dependencies and convert the list > in the form expected in a guix package. > > I hope to have clarified the strategy. > >>> +(define (guix-name name) >> >> Rather =E2=80=98hackage-name->package-name=E2=80=99? > > OK > >>> +;; Split the comma separated list of dependencies coming from the caba= l file >>> +;; and return a list with inputs suitable for the GUIX package. Curre= ntly the >>> +;; version information is discarded. >> >> s/GUIX/Guix/ > > OK > >>> +(define (split-dependencies ls) >>> + (define (split-at-comma d) >>> + (map >>> + (lambda (m) >>> + (let ((name (guix-name (match:substring m 1)))) >>> + (list name (list 'unquote (string->symbol name))))) >>> + (list-matches dependencies-rx d))) >> >> I think it could use simply: >> >> (map string-trim-both >> (string-tokenize d (char-set-complement (char-set #\,)))) > > Actually, this wouldn't do. On top of splitting at commas, the function = also > translates the name of the package from hackage-name to guix-name. Right, but then it=E2=80=99s best to have two separate procedures and two compose them: (map (compose hackage-name->package-name string-trim-both) (string-tokenize d (char-set-complement (char-set #\,)))) > From 231ea64519505f84e252a4b3ab14d3857a9374c2 Mon Sep 17 00:00:00 2001 > From: Federico Beffa > Date: Sat, 7 Mar 2015 17:23:14 +0100 > Subject: [PATCH 1/5] import: Add hackage importer. >=20 > * guix/scripts/import.scm (importers): Add hackage. [...] > From ad79b3b0bc8b08ec98f49b665b32ce9259516713 Mon Sep 17 00:00:00 2001 > From: Federico Beffa > Date: Sat, 7 Mar 2015 17:30:17 +0100 > Subject: [PATCH 2/5] import: Add hackage importer. >=20 > * guix/scripts/import/hackage.scm: New file. These two should be one patch, along with the modification of POTFILES.in, because these 3 things correspond to a single logical change=E2=80=93the addition of a sub-command and associated infrastructure.= A fourth part is also needed: the addition of a couple of paragraphs in guix.texi. > From 447c6b8f1ee6cc1d4edba44cabef1b28b75c7608 Mon Sep 17 00:00:00 2001 > From: Federico Beffa > Date: Sun, 8 Mar 2015 07:48:38 +0100 > Subject: [PATCH 3/5] import: Add hackage importer. >=20 > * guix/import/hackage.scm: New file. Please add tests/hackage.test as part of the same patch. > +;; (use-modules (ice-9 match)) > +;; (use-modules (ice-9 regex)) > +;; (use-modules (ice-9 rdelim)) > +;; (use-modules (ice-9 receive)) > +;; (use-modules (ice-9 pretty-print)) > +;; (use-modules (srfi srfi-26)) > +;; (use-modules (srfi srfi-11)) > +;; (use-modules (srfi srfi-1)) Debugging leftover? Nothing to add to what I wrote above. [...] > +(define test-cabal-1 > + "name: foo > +version: 1.0.0 > +homepage: http://test.org > +synopsis: synopsis > +description: description > +license: BSD3 > +executable cabal > + build-depends: > + HTTP >=3D 4000.2.5 && < 4000.3, > + mtl >=3D 2.0 && < 3 > +") > + > +(define test-cond-1 > + "(os(darwin) || !(flag(debug))) && flag(cips)") > + > +(define read-cabal > + (@@ (guix import hackage) read-cabal)) > + > +(define strip-cabal > + (@@ (guix import hackage) strip-cabal)) > + > +(define eval-tests > + (@@ (guix import hackage) eval-tests)) > + > +(define eval-impl > + (@@ (guix import hackage) eval-impl)) > + > +(define eval-flags > + (@@ (guix import hackage) eval-flags)) > + > +(define conditional->sexp-like > + (@@ (guix import hackage) conditional->sexp-like)) I think it would be fine to export =E2=80=98read-cabal=E2=80=99 and =E2=80= =98eval-cabal=E2=80=99 once these two have the desired interface. > +(test-begin "hackage") > + > +(test-assert "hackage->guix-package" > + ;; Replace network resources with sample data. > + (mock > + ((guix import hackage) hackage-fetch > + (lambda (name-version) > + (read-cabal > + (call-with-input-string test-cabal-1 > + strip-cabal)))) > + (match (hackage->guix-package "foo") > + (('package > + ('name "ghc-foo") > + ('version "1.0.0") > + ('source > + ('origin > + ('method 'url-fetch) > + ('uri ('string-append > + "http://hackage.haskell.org/package/foo/foo-" > + 'version > + ".tar.gz")) > + ('sha256 > + ('base32 > + (? string? hash))))) > + ('build-system 'haskell-build-system) > + ('inputs > + ('quasiquote > + (("ghc-http" ('unquote 'ghc-http)) > + ("ghc-mtl" ('unquote 'ghc-mtl))))) > + ('home-page "http://test.org") > + ('synopsis (? string?)) > + ('description (? string?)) > + ('license 'bsd-3)) > + #t) Nice. With the adjusted semantics, there could be lower-level tests: (test-equal "read-cabal" '(("name" "foo") ...) (call-with-input-string test-cabal-1 read-cabal)) and: (test-assert "eval-cabal, conditions" (let ((p (eval-cabal '(("name" "foo") ("executable cabal" (if (os windows) ...)))))) (and (cabal-package? p) (string=3D? "foo" (cabal-package-name p)) (equal? '(unix) (cabal-package-dependencies p))))) Looks like we=E2=80=99re almost there. I hope the above makes sense! Thank you, Ludo=E2=80=99.