From mboxrd@z Thu Jan 1 00:00:00 1970 From: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=) Subject: Re: hackage importer Date: Sat, 02 May 2015 14:48:10 +0200 Message-ID: <87pp6j6t85.fsf@gnu.org> References: <87k2yiqqaw.fsf@gnu.org> <871tkbdhwc.fsf@gnu.org> <871tk7ykfj.fsf@gnu.org> <87zj6t9tq5.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:38836) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YoWqB-0007Kb-DE for guix-devel@gnu.org; Sat, 02 May 2015 08:48:21 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YoWq6-0007c8-VO for guix-devel@gnu.org; Sat, 02 May 2015 08:48:19 -0400 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:34837) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YoWq6-0007c2-RF for guix-devel@gnu.org; Sat, 02 May 2015 08:48:14 -0400 In-Reply-To: (Federico Beffa's message of "Sun, 26 Apr 2015 13:38:33 +0200") List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org To: Federico Beffa Cc: Guix-devel Federico Beffa skribis: > Please find attached a patch reorganizing the code as you suggest. Woow, neat! Impressive work. I think this is a great improvement. I have a bunch of stylistic comments below, and some open questions as well. > From bc8cdab1e322a25002a3d9cf33eddd856c8a81d8 Mon Sep 17 00:00:00 2001 > From: Federico Beffa > Date: Sun, 26 Apr 2015 11:22:29 +0200 > Subject: [PATCH] import: hackage: Refactor parsing code and add new optio= n. > > * guix/import/cabal.scm: New file. > > * guix/import/hackage.scm: Update to use the new Cabal parsing module. > > * tests/hackage.scm: Update tests for private functions. > > * guix/scripts/import/hackage.scm: Add new '--cabal-environment' option. > > * doc/guix.texi: ... and document it. > > * Makefile.am (MODULES): Add 'guix/import/cabal.scm', > 'guix/import/hackage.scm' and 'guix/scripts/import/hackage.scm'. > (SCM_TESTS): Add 'tests/hackage.scm'. No newlines between entries. [...] > +;; This record stores the state information needed during parsing of Cab= al > +;; files. > +(define-record-type > + (make-cabal-parse-state lines minimum-indent indents conditionals > + true-group? true-group false-group > + true-group?-stack true-group-stack false-group= -stack) [...] > + (make-cabal-parse-state lines -1 '() '() #t '() '() '() '() = '()))) I=E2=80=99m not claiming this must done now, but it may improve readability= to use =E2=80=98define-record-type*=E2=80=99. That way, with appropriate fiel= d default values, one could write something like: (cabal-parse-state (lines lines)) That would also allow the use of =E2=80=98inherit=E2=80=99, which is slight= ly less verbose than =E2=80=98set-fields=E2=80=99. WDYT? Besides, could you add comments to explain the meaning of the various fields? I=E2=80=99m particularly curious about =E2=80=98true-group?=E2=80= =99 & co. ;-) > +(define (parse-cabal result) > + "Parse a Cabal file and append its content to RESULT (a list). Return= the > +updated result as a monadic value in the state monad." > + (mlet* %state-monad ((state (current-state))) > + (match state > + (($ lines minimum-indent indents conditionals > + true-group? true-group false-group > + true-group?-stack true-group-stack > + false-group-stack) > + (let*-values > + (((current-indent line) > + (if (null? lines) > + (values 0 "") > + (line-indentation+rest (first lines)))) > + ((next-line-indent next-line) > + (if (or (null? lines) (null? (cdr lines))) > + (values 0 "") > + (line-indentation+rest (second lines)))) > + ((key-value-rx-result) (has-key? line)) > + ((end-of-file?) (null? lines)) > + ((is-simple-key-value?) (and (=3D next-line-indent current-i= ndent) > + key-value-rx-result)) > + ((is-multi-line-key-value?) (and (> next-line-indent current= -indent) > + key-value-rx-result)) > + ((key) (and=3D> key-value-rx-result > + (lambda (rx-res) > + (string-downcase (match:substring rx-res 1))= ))) > + ((value) (and=3D> key-value-rx-result (cut match:substring <= > 2)))) > + (cond > + (end-of-file? (return (reverse result))) > + (is-simple-key-value? > + (>>=3D (state-add-entry (list key `(,value)) result (cdr line= s)) > + parse-cabal)) > + (is-multi-line-key-value? > + (let*-values=20 > + (((value-lst lines) > + (multi-line-value (cdr lines) > + (if (string-null? value) '() `(,value= ))))) > + (>>=3D (state-add-entry (list key value-lst) result lines) > + parse-cabal))) > + (else ; it's a section > + (let* ((section-header (string-tokenize (string-downcase line= ))) > + (section-type (string->symbol (first section-header))) > + (section-name (if (> (length section-header) 1) > + (second section-header) > + ""))) > + (mbegin %current-monad > + (set-current-state=20 > + (set-fields state > + ((cabal-parse-state-minimum-indent) current-= indent) > + ((cabal-parse-state-lines) (cdr lines)))) > + (>>=3D > + (>>=3D (parse-cabal-section '()) > + (lambda (section-contents) > + (mlet* %state-monad ((state (current-state))) > + (mbegin %current-monad > + (set-current-state > + (set-fields state > + ((cabal-parse-state-minimum-inde= nt) -1))) > + (return=20 > + (cons (append > + (if (string-null? section-name) > + (list 'section section-type) > + (list 'section section-type secti= on-name)) > + (list section-contents)) > + result)))))) > + parse-cabal)))))))))) This procedure is intimidating. I think this is partly due to its length, to the big let-values, the long identifiers, the many local variables, nested binds, etc. Would it be possible to create auxiliary procedures that would help? I=E2=80=99m thinking of procedures that take a and retu= rn the necessary info, like =E2=80=98cabal-parse-state-indentation=E2=80=99, =E2=80=98cabal-parse-state-key=E2=80=99, =E2=80=98cabal-parse-state-multili= ne?=E2=80=99, =E2=80=98cabal-parse-state-eof?=E2=80=99, etc. WDYT? Also, please try hard to avoid car and cdr and use =E2=80=98match=E2=80=99 = instead. (BTW it=E2=80=99s a good idea to use the state monad here!) > +(define (parse-cabal-section result) > + "Parse a section of a cabal file and append its content to RESULT (a l= ist). > +Return the updated result as a value in the state monad." > + (mlet* %state-monad ((state (current-state))) > + (match state > + (($ lines minimum-indent indents conditionals > + true-group? true-group false-group > + true-group?-stack true-group-stack > + false-group-stack) > + (let*-values > + (((current-indent line) > + (if (null? lines) > + (values 0 "") > + (line-indentation+rest (first lines)))) > + ((next-line-indent next-line) > + (if (or (null? lines) (null? (cdr lines))) > + (values 0 "") > + (line-indentation+rest (second lines)))) > + ((key-value-rx-result) (has-key? line)) > + ((end-of-section?) (or (<=3D current-indent minimum-indent) > + (null? lines))) > + ;; If this is the last line of the section, then it can't be= the > + ;; start of a conditional or an 'else'. > + ((last-line-of-section?) (<=3D next-line-indent minimum-inde= nt)) > + ((is-simple-key-value?) (or (and (=3D next-line-indent curre= nt-indent) > + key-value-rx-result) > + (and (pair? conditionals) > + (=3D next-line-indent (firs= t indents)) > + (string-prefix? "else" next= -line)))) > + ((is-multi-line-key-value?) (and (> next-line-indent current= -indent) > + key-value-rx-result)) > + ((end-of-cond?) > + (and (pair? conditionals) > + (or (and (=3D next-line-indent (first indents)) > + (not (string-prefix? "else" next-line))) > + (< next-line-indent (first indents))))) > + ((is-else?) (and (pair? conditionals) > + (=3D current-indent (first indents)) > + (string-prefix? "else" line))) > + ((condition) (cabal-conditional-line->sexp line)) > + ((key) (and=3D> key-value-rx-result > + (lambda (rx-res) > + (string-downcase (match:substring rx-res 1))= ))) > + ((value) (and=3D> key-value-rx-result > + (cut match:substring <> 2)))) > + (cond > + (end-of-section? > + (if (pair? indents) > + (state-reduce-indentation (1- (length indents)) #f result= lines) > + (return result))) > + (last-line-of-section? > + (if (pair? indents) > + (state-reduce-indentation > + (1- (length indents)) (list key `(,value)) result (cdr l= ines)) > + (mbegin %current-monad > + (set-current-state=20 > + (set-fields state ((cabal-parse-state-lines) (cdr line= s)))) > + (return (cons (list key `(,value)) result))))) > + (is-simple-key-value? > + (>>=3D (state-add-entry (list key `(,value)) result (cdr line= s)) > + parse-cabal-section)) > + (is-multi-line-key-value? > + (let*-values > + ;; VALUE-LST is the full multi-line value and LINES are t= he > + ;; remaining lines to be parsed (from the line following = the > + ;; multi-line value). We need to check if we are at the = end of > + ;; a conditional or at the end of the section. > + (((value-lst lines) > + (multi-line-value (cdr lines) > + (if (string-null? value) '() `(,value= )))) > + ((ind line) (if (null? lines) > + (values 0 "") > + (line-indentation+rest (first lines)))) > + ((end-of-cond?) (and (pair? conditionals) > + (or (and (=3D ind (first indents)) > + (not (string-prefix? "else= " line))) > + (< ind (first indents))))) > + ;; If IND is not in INDENTS, assume that we are at the e= nd of > + ;; the section. > + ((idx) (or (and=3D> > + (list-index (cut =3D ind <>) indents) > + (cut + <> (if (string-prefix? "else" line) -= 1 0))) > + (1- (length indents))))) > + (if end-of-cond? > + (>>=3D (state-reduce-indentation idx (list key value-ls= t) > + result lines) > + parse-cabal-section) > + (>>=3D (state-add-entry (list key value-lst) result lin= es) > + parse-cabal-section)))) > + (end-of-cond? > + (let ((idx (+ (list-index (cut =3D next-line-indent <>) inden= ts) > + (if (string-prefix? "else" next-line) -1 0)))) > + (>>=3D (state-reduce-indentation idx (list key `(,value)) r= esult > + (if (pair? lines) (cdr lines) '= ())) > + parse-cabal-section))) > + (is-else? > + (mbegin %current-monad > + (set-current-state=20 > + (set-fields state > + ((cabal-parse-state-lines) (cdr lines)) > + ((cabal-parse-state-true-group?) #f))) > + (parse-cabal-section result))) > + (condition > + (mbegin %current-monad > + (state-add-conditional condition current-indent) > + (parse-cabal-section result))))))))) This one is also very intimidating and it seems to duplicate some of the code of the previous one, so maybe the propose procedures will help here as well. > +(define (state-reduce-indentation index entry result lines) s/reduce/decrease/ > + "Given RESULT, if ENTRY is not #f, add it as appropriate and return the > +updated result as a value in the state monad. Update the state accordin= g to > +the reduction of the indentation level specified by INDEX, an index of an s/reduction/decrease/ > +entry in the 'indentations' field of the state. Could you explain what RESULT and ENTRY are? Also, it seems to do two different things: compute a value function of RESULT and ENTRY, and update the current indentation. Should it be two separate procedures? > As an example, if there are > +two nested conditional levels, the first starting at indentation 2 and t= he > +second at indentation 4, then the 'indentations' field of state is '(4 2= ) and > +an INDEX value of 0 means that the second conditional is finished. Set = the > +remaining lines to be parsed to LINES." > + (lambda (state) > + (match state > + (($ _ minimum-indent indents conditionals > + true-group? true-group false-group > + true-group?-stack true-group-stack > + false-group-stack) > + ;; The suffix '-d' stays for 'drop'. > + (let*-values (((inds-d inds) (split-at indents (1+ index))) > + ((conds-d conds) (split-at conditionals (1+ index))) > + ((t-g?-s-d t-g?-s) > + (if (> (length true-group?-stack) index) > + (split-at true-group?-stack (1+ index)) > + (values true-group?-stack '()))) > + ((t-g-s-d t-g-s) > + (if (> (length true-group-stack) index) > + (split-at true-group-stack (1+ index)) > + (values true-group-stack '()))) > + ((f-g-s-d f-g-s) > + (if (> (length false-group-stack) index) > + (split-at false-group-stack (1+ index)) > + (values false-group-stack '()))) > + ((t-g?) > + (if (> (length true-group?-stack) index)=20 > + (last t-g?-s-d) #t)) > + ((t-g) (if (and true-group? entry) > + (cons entry true-group) > + true-group)) > + ((f-g) (if (or true-group? (not entry)) > + false-group > + (cons entry false-group))) > + ((res) result)) > + (let reduce-by-one ((conds-d conds-d) (t-g t-g) (f-g f-g) (res = res) > + (t-g?-s-d t-g?-s-d) (t-g-s-d t-g-s-d)=20 > + (f-g-s-d f-g-s-d)) This is somewhat scary ;-) but I=E2=80=99m not sure how to improve it. > + (values '*unspecified* Did you mean *unspecified*, without the quote, which evaluates to *the* unspecified value? > +(define-record-type > + (make-cabal-package name version license home-page source-repository > + synopsis description > + executables lib test-suites > + flags eval-environment) > + cabal-package? > + (name cabal-package-name) > + (version cabal-package-version) > + (license cabal-package-license) > + (home-page cabal-package-home-page) > + (source-repository cabal-package-source-repository) > + (synopsis cabal-package-synopsis) > + (description cabal-package-description) > + (executables cabal-package-executables) > + (lib cabal-package-library) ; 'library' is a Scheme keyword There are no keyboards in Scheme. :-) [...] > + (define (impl haskell) > + (let* ((haskell-implementation (or (assoc-ref env "impl") "ghc")) > + (impl-rx-result-with-version > + (string-match "([a-zA-Z0-9_]+)-([0-9.]+)" haskell-implementa= tion)) > + (impl-name (or (and=3D> impl-rx-result-with-version > + (cut match:substring <> 1)) > + haskell-implementation)) > + (impl-version (and=3D> impl-rx-result-with-version > + (cut match:substring <> 2))) > + (cabal-rx-result-with-version > + (string-match "([a-zA-Z0-9_-]+) *([<>=3D]+) *([0-9.]+) *" ha= skell)) > + (cabal-rx-result-without-version=20 > + (string-match "([a-zA-Z0-9_-]+)" haskell)) > + (cabal-impl-name (or (and=3D> cabal-rx-result-with-version > + (cut match:substring <> 1)) > + (match:substring=20 > + cabal-rx-result-without-version 1))) > + (cabal-impl-version (and=3D> cabal-rx-result-with-version > + (cut match:substring <> 3))) > + (cabal-impl-operator (and=3D> cabal-rx-result-with-version > + (cut match:substring <> 2))) > + (comparison (and=3D> cabal-impl-operator > + (cut string-append "string" <>)))) Again I feel we need one or more auxiliary procedures and/or data types here to simplify this part (fewer local variables), as well as shorter identifiers. WDYT? > --- a/tests/hackage.scm > +++ b/tests/hackage.scm > @@ -63,16 +63,13 @@ executable cabal > ") >=20=20 The existing tests here are fine, but they are more like integration tests (they test the whole pipeline.) Maybe it would be nice to directly exercise =E2=80=98read-cabal=E2=80=99 and =E2=80=98eval-cabal=E2= =80=99 individually? Thanks for all of this! Ludo=E2=80=99.