From mboxrd@z Thu Jan  1 00:00:00 1970
From: ludo@gnu.org (Ludovic =?utf-8?Q?Court=C3=A8s?=)
Subject: Re: hackage importer
Date: Sat, 02 May 2015 14:48:10 +0200
Message-ID: <87pp6j6t85.fsf@gnu.org>
References: <CAKrPhPNs7-A_JaQa7vVzB_hCs0rkSTC_KZWF+Yrx9kSxXeRZig@mail.gmail.com>
	<87k2yiqqaw.fsf@gnu.org>
	<CAKrPhPO91_m6F=G3CS+vFW=uF5t54=E=Dr2mtnqc0VsuLCR43w@mail.gmail.com>
	<CAKrPhPM3Hk_vpvyxYdDX3XgPnhvydaLt0HWijPoUJ8LTO4DRkQ@mail.gmail.com>
	<871tkbdhwc.fsf@gnu.org>
	<CAKrPhPNkg2pJuFH-HtqMBU2ROP8=qzxjBcWtGz_qtUQQGPcMJg@mail.gmail.com>
	<871tk7ykfj.fsf@gnu.org>
	<CAKrPhPMMAGS3DMCmGh-dhvWyXrxpY=ca8sT0YBQK=kb2Bu-sSw@mail.gmail.com>
	<87zj6t9tq5.fsf@gnu.org>
	<CAKrPhPOTKsYLp1U_oj_YP3F6akKCzB_w+=4MpJfQOpXHpXZJhQ@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Return-path: <guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org>
Received: from eggs.gnu.org ([2001:4830:134:3::10]:38836)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <ludo@gnu.org>) id 1YoWqB-0007Kb-DE
	for guix-devel@gnu.org; Sat, 02 May 2015 08:48:21 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <ludo@gnu.org>) id 1YoWq6-0007c8-VO
	for guix-devel@gnu.org; Sat, 02 May 2015 08:48:19 -0400
Received: from fencepost.gnu.org ([2001:4830:134:3::e]:34837)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <ludo@gnu.org>)
	id 1YoWq6-0007c2-RF
	for guix-devel@gnu.org; Sat, 02 May 2015 08:48:14 -0400
In-Reply-To: <CAKrPhPOTKsYLp1U_oj_YP3F6akKCzB_w+=4MpJfQOpXHpXZJhQ@mail.gmail.com>
	(Federico Beffa's message of "Sun, 26 Apr 2015 13:38:33 +0200")
List-Id: "Development of GNU Guix and the GNU System distribution."
	<guix-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/guix-devel>,
	<mailto:guix-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/guix-devel>
List-Post: <mailto:guix-devel@gnu.org>
List-Help: <mailto:guix-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/guix-devel>,
	<mailto:guix-devel-request@gnu.org?subject=subscribe>
Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org
Sender: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org
To: Federico Beffa <beffa@ieee.org>
Cc: Guix-devel <guix-devel@gnu.org>

Federico Beffa <beffa@ieee.org> skribis:

> Please find attached a patch reorganizing the code as you suggest.

Woow, neat!  Impressive work.  I think this is a great improvement.

I have a bunch of stylistic comments below, and some open questions as
well.

> From bc8cdab1e322a25002a3d9cf33eddd856c8a81d8 Mon Sep 17 00:00:00 2001
> From: Federico Beffa <beffa@fbengineering.ch>
> Date: Sun, 26 Apr 2015 11:22:29 +0200
> Subject: [PATCH] import: hackage: Refactor parsing code and add new optio=
n.
>
> * guix/import/cabal.scm: New file.
>
> * guix/import/hackage.scm: Update to use the new Cabal parsing module.
>
> * tests/hackage.scm: Update tests for private functions.
>
> * guix/scripts/import/hackage.scm: Add new '--cabal-environment' option.
>
> * doc/guix.texi: ... and document it.
>
> * Makefile.am (MODULES): Add 'guix/import/cabal.scm',
>   'guix/import/hackage.scm' and 'guix/scripts/import/hackage.scm'.
>   (SCM_TESTS): Add 'tests/hackage.scm'.

No newlines between entries.


[...]

> +;; This record stores the state information needed during parsing of Cab=
al
> +;; files.
> +(define-record-type  <cabal-parse-state>
> +  (make-cabal-parse-state lines minimum-indent indents conditionals
> +                          true-group? true-group false-group
> +                          true-group?-stack true-group-stack false-group=
-stack)


[...]

> +            (make-cabal-parse-state lines -1 '() '() #t '() '() '() '() =
'())))

I=E2=80=99m not claiming this must done now, but it may improve readability=
 to
use =E2=80=98define-record-type*=E2=80=99.  That way, with appropriate fiel=
d default
values, one could write something like:

  (cabal-parse-state
    (lines lines))

That would also allow the use of =E2=80=98inherit=E2=80=99, which is slight=
ly less
verbose than =E2=80=98set-fields=E2=80=99.  WDYT?

Besides, could you add comments to explain the meaning of the various
fields?  I=E2=80=99m particularly curious about =E2=80=98true-group?=E2=80=
=99 & co.  ;-)

> +(define (parse-cabal result)
> +  "Parse a Cabal file and append its content to RESULT (a list).  Return=
 the
> +updated result as a monadic value in the state monad."
> +  (mlet* %state-monad ((state (current-state)))
> +    (match state
> +      (($ <cabal-parse-state> lines minimum-indent indents conditionals
> +                              true-group? true-group false-group
> +                              true-group?-stack true-group-stack
> +                              false-group-stack)
> +       (let*-values
> +           (((current-indent line)
> +             (if (null? lines)
> +                 (values 0 "")
> +                 (line-indentation+rest (first lines))))
> +            ((next-line-indent next-line)
> +             (if (or (null? lines) (null? (cdr lines)))
> +                 (values 0 "")
> +                 (line-indentation+rest (second lines))))
> +            ((key-value-rx-result) (has-key? line))
> +            ((end-of-file?) (null? lines))
> +            ((is-simple-key-value?) (and (=3D next-line-indent current-i=
ndent)
> +                                         key-value-rx-result))
> +            ((is-multi-line-key-value?) (and (> next-line-indent current=
-indent)
> +                                             key-value-rx-result))
> +            ((key) (and=3D> key-value-rx-result
> +                          (lambda (rx-res)
> +                            (string-downcase (match:substring rx-res 1))=
)))
> +            ((value) (and=3D> key-value-rx-result (cut match:substring <=
> 2))))
> +         (cond
> +          (end-of-file? (return (reverse result)))
> +          (is-simple-key-value?
> +           (>>=3D (state-add-entry (list key `(,value)) result (cdr line=
s))
> +                parse-cabal))
> +          (is-multi-line-key-value?
> +           (let*-values=20
> +               (((value-lst lines)
> +                 (multi-line-value (cdr lines)
> +                                   (if (string-null? value) '() `(,value=
)))))
> +             (>>=3D (state-add-entry (list key value-lst) result lines)
> +                  parse-cabal)))
> +          (else ; it's a section
> +           (let* ((section-header (string-tokenize (string-downcase line=
)))
> +                  (section-type (string->symbol (first section-header)))
> +                  (section-name (if (> (length section-header) 1)
> +                                     (second section-header)
> +                                     "")))
> +             (mbegin %current-monad
> +               (set-current-state=20
> +                (set-fields state
> +                            ((cabal-parse-state-minimum-indent) current-=
indent)
> +                            ((cabal-parse-state-lines) (cdr lines))))
> +               (>>=3D
> +                (>>=3D (parse-cabal-section '())
> +                     (lambda (section-contents)
> +                       (mlet* %state-monad ((state (current-state)))
> +                         (mbegin %current-monad
> +                           (set-current-state
> +                            (set-fields state
> +                                        ((cabal-parse-state-minimum-inde=
nt) -1)))
> +                           (return=20
> +                            (cons (append
> +                                   (if (string-null? section-name)
> +                                       (list 'section section-type)
> +                                       (list 'section section-type secti=
on-name))
> +                                   (list section-contents))
> +                                  result))))))
> +                parse-cabal))))))))))

This procedure is intimidating.  I think this is partly due to its
length, to the big let-values, the long identifiers, the many local
variables, nested binds, etc.

Would it be possible to create auxiliary procedures that would help?
I=E2=80=99m thinking of procedures that take a <cabal-parse-state> and retu=
rn
the necessary info, like =E2=80=98cabal-parse-state-indentation=E2=80=99,
=E2=80=98cabal-parse-state-key=E2=80=99, =E2=80=98cabal-parse-state-multili=
ne?=E2=80=99,
=E2=80=98cabal-parse-state-eof?=E2=80=99, etc.  WDYT?

Also, please try hard to avoid car and cdr and use =E2=80=98match=E2=80=99 =
instead.

(BTW it=E2=80=99s a good idea to use the state monad here!)

> +(define (parse-cabal-section result)
> +  "Parse a section of a cabal file and append its content to RESULT (a l=
ist).
> +Return the updated result as a value in the state monad."
> +  (mlet* %state-monad ((state (current-state)))
> +    (match state
> +      (($ <cabal-parse-state> lines minimum-indent indents conditionals
> +                              true-group? true-group false-group
> +                              true-group?-stack true-group-stack
> +                              false-group-stack)
> +       (let*-values
> +           (((current-indent line)
> +             (if (null? lines)
> +                 (values 0 "")
> +                 (line-indentation+rest (first lines))))
> +            ((next-line-indent next-line)
> +             (if (or (null? lines) (null? (cdr lines)))
> +                 (values 0 "")
> +                 (line-indentation+rest (second lines))))
> +            ((key-value-rx-result) (has-key? line))
> +            ((end-of-section?) (or (<=3D current-indent minimum-indent)
> +                                   (null? lines)))
> +            ;; If this is the last line of the section, then it can't be=
 the
> +            ;; start of a conditional or an 'else'.
> +            ((last-line-of-section?) (<=3D next-line-indent minimum-inde=
nt))
> +            ((is-simple-key-value?) (or (and (=3D next-line-indent curre=
nt-indent)
> +                                             key-value-rx-result)
> +                                        (and (pair? conditionals)
> +                                             (=3D next-line-indent (firs=
t indents))
> +                                             (string-prefix? "else" next=
-line))))
> +            ((is-multi-line-key-value?) (and (> next-line-indent current=
-indent)
> +                                             key-value-rx-result))
> +            ((end-of-cond?)
> +             (and (pair? conditionals)
> +                  (or (and (=3D next-line-indent (first indents))
> +                           (not (string-prefix? "else" next-line)))
> +                      (< next-line-indent (first indents)))))
> +            ((is-else?) (and (pair? conditionals)
> +                             (=3D current-indent (first indents))
> +                             (string-prefix? "else" line)))
> +            ((condition) (cabal-conditional-line->sexp line))
> +            ((key) (and=3D> key-value-rx-result
> +                          (lambda (rx-res)
> +                            (string-downcase (match:substring rx-res 1))=
)))
> +            ((value) (and=3D> key-value-rx-result
> +                            (cut match:substring <> 2))))
> +         (cond
> +          (end-of-section?
> +           (if (pair? indents)
> +               (state-reduce-indentation (1- (length indents)) #f result=
 lines)
> +               (return result)))
> +          (last-line-of-section?
> +           (if (pair? indents)
> +               (state-reduce-indentation
> +                (1- (length indents)) (list key `(,value)) result (cdr l=
ines))
> +               (mbegin %current-monad
> +                 (set-current-state=20
> +                  (set-fields state ((cabal-parse-state-lines) (cdr line=
s))))
> +                 (return (cons (list key `(,value)) result)))))
> +          (is-simple-key-value?
> +           (>>=3D (state-add-entry (list key `(,value)) result (cdr line=
s))
> +                parse-cabal-section))
> +          (is-multi-line-key-value?
> +           (let*-values
> +               ;; VALUE-LST is the full multi-line value and LINES are t=
he
> +               ;; remaining lines to be parsed (from the line following =
the
> +               ;; multi-line value).  We need to check if we are at the =
end of
> +               ;; a conditional or at the end of the section.
> +               (((value-lst lines)
> +                 (multi-line-value (cdr lines)
> +                                   (if (string-null? value) '() `(,value=
))))
> +                ((ind line) (if (null? lines)
> +                                (values 0 "")
> +                                (line-indentation+rest (first lines))))
> +                ((end-of-cond?) (and (pair? conditionals)
> +                                     (or (and (=3D ind (first indents))
> +                                              (not (string-prefix? "else=
" line)))
> +                                         (< ind (first indents)))))
> +                ;; If IND is not in INDENTS, assume that we are at the e=
nd of
> +                ;; the section.
> +                ((idx) (or (and=3D>
> +                            (list-index (cut =3D ind <>) indents)
> +                            (cut + <> (if (string-prefix? "else" line) -=
1 0)))
> +                           (1- (length indents)))))
> +             (if end-of-cond?
> +                 (>>=3D (state-reduce-indentation idx (list key value-ls=
t)
> +                                                result lines)
> +                      parse-cabal-section)
> +                 (>>=3D (state-add-entry (list key value-lst) result lin=
es)
> +                      parse-cabal-section))))
> +          (end-of-cond?
> +           (let ((idx (+ (list-index (cut =3D next-line-indent <>) inden=
ts)
> +                         (if (string-prefix? "else" next-line) -1 0))))
> +             (>>=3D (state-reduce-indentation idx (list key `(,value)) r=
esult
> +                                         (if (pair? lines) (cdr lines) '=
()))
> +                  parse-cabal-section)))
> +          (is-else?
> +           (mbegin %current-monad
> +             (set-current-state=20
> +              (set-fields state
> +                          ((cabal-parse-state-lines) (cdr lines))
> +                          ((cabal-parse-state-true-group?) #f)))
> +             (parse-cabal-section result)))
> +          (condition
> +           (mbegin %current-monad
> +             (state-add-conditional condition current-indent)
> +             (parse-cabal-section result)))))))))

This one is also very intimidating and it seems to duplicate some of the
code of the previous one, so maybe the propose <cabal-state> procedures
will help here as well.

> +(define (state-reduce-indentation index entry result lines)

s/reduce/decrease/

> +  "Given RESULT, if ENTRY is not #f, add it as appropriate and return the
> +updated result as a value in the state monad.  Update the state accordin=
g to
> +the reduction of the indentation level specified by INDEX, an index of an

s/reduction/decrease/

> +entry in the 'indentations' field of the state.

Could you explain what RESULT and ENTRY are?  Also, it seems to do two
different things: compute a value function of RESULT and ENTRY, and
update the current indentation.  Should it be two separate procedures?

> As an example, if there are
> +two nested conditional levels, the first starting at indentation 2 and t=
he
> +second at indentation 4, then the 'indentations' field of state is '(4 2=
) and
> +an INDEX value of 0 means that the second conditional is finished.  Set =
the
> +remaining lines to be parsed to LINES."
> +  (lambda (state)
> +    (match state
> +      (($ <cabal-parse-state> _ minimum-indent indents conditionals
> +                              true-group? true-group false-group
> +                              true-group?-stack true-group-stack
> +                              false-group-stack)
> +       ;; The suffix '-d' stays for 'drop'.
> +       (let*-values (((inds-d inds) (split-at indents (1+ index)))
> +                     ((conds-d conds) (split-at conditionals (1+ index)))
> +                     ((t-g?-s-d t-g?-s)
> +                      (if (> (length true-group?-stack) index)
> +                          (split-at true-group?-stack (1+ index))
> +                          (values true-group?-stack '())))
> +                     ((t-g-s-d t-g-s)
> +                      (if (> (length true-group-stack) index)
> +                          (split-at true-group-stack (1+ index))
> +                          (values true-group-stack '())))
> +                     ((f-g-s-d f-g-s)
> +                      (if (> (length false-group-stack) index)
> +                          (split-at false-group-stack (1+ index))
> +                          (values false-group-stack '())))
> +                     ((t-g?)
> +                      (if (> (length true-group?-stack) index)=20
> +                          (last t-g?-s-d) #t))
> +                     ((t-g) (if (and true-group? entry)
> +                                (cons entry true-group)
> +                                true-group))
> +                     ((f-g) (if (or true-group? (not entry))
> +                                false-group
> +                                (cons entry false-group)))
> +                     ((res) result))
> +         (let reduce-by-one ((conds-d conds-d) (t-g t-g) (f-g f-g) (res =
res)
> +                             (t-g?-s-d t-g?-s-d) (t-g-s-d t-g-s-d)=20
> +                             (f-g-s-d f-g-s-d))

This is somewhat scary ;-) but I=E2=80=99m not sure how to improve it.

> +           (values '*unspecified*

Did you mean *unspecified*, without the quote, which evaluates to *the*
unspecified value?

> +(define-record-type <cabal-package>
> +  (make-cabal-package name version license home-page source-repository
> +                      synopsis description
> +                      executables lib test-suites
> +                      flags eval-environment)
> +  cabal-package?
> +  (name   cabal-package-name)
> +  (version cabal-package-version)
> +  (license cabal-package-license)
> +  (home-page cabal-package-home-page)
> +  (source-repository cabal-package-source-repository)
> +  (synopsis cabal-package-synopsis)
> +  (description cabal-package-description)
> +  (executables cabal-package-executables)
> +  (lib cabal-package-library) ; 'library' is a Scheme keyword

There are no keyboards in Scheme.  :-)


[...]

> +  (define (impl haskell)
> +    (let* ((haskell-implementation (or (assoc-ref env "impl") "ghc"))
> +           (impl-rx-result-with-version
> +            (string-match "([a-zA-Z0-9_]+)-([0-9.]+)" haskell-implementa=
tion))
> +           (impl-name (or (and=3D> impl-rx-result-with-version
> +                                 (cut match:substring <> 1))
> +                          haskell-implementation))
> +           (impl-version (and=3D> impl-rx-result-with-version
> +                                (cut match:substring <> 2)))
> +           (cabal-rx-result-with-version
> +            (string-match "([a-zA-Z0-9_-]+) *([<>=3D]+) *([0-9.]+) *" ha=
skell))
> +           (cabal-rx-result-without-version=20
> +            (string-match "([a-zA-Z0-9_-]+)" haskell))
> +           (cabal-impl-name (or (and=3D> cabal-rx-result-with-version
> +                                       (cut match:substring <> 1))
> +                                (match:substring=20
> +                                 cabal-rx-result-without-version 1)))
> +           (cabal-impl-version (and=3D> cabal-rx-result-with-version
> +                                      (cut match:substring <> 3)))
> +           (cabal-impl-operator (and=3D> cabal-rx-result-with-version
> +                                       (cut match:substring <> 2)))
> +           (comparison (and=3D> cabal-impl-operator
> +                              (cut string-append "string" <>))))

Again I feel we need one or more auxiliary procedures and/or data types
here to simplify this part (fewer local variables), as well as shorter
identifiers.  WDYT?

> --- a/tests/hackage.scm
> +++ b/tests/hackage.scm
> @@ -63,16 +63,13 @@ executable cabal
>  ")
>=20=20

The existing tests here are fine, but they are more like integration
tests (they test the whole pipeline.)  Maybe it would be nice to
directly exercise =E2=80=98read-cabal=E2=80=99 and =E2=80=98eval-cabal=E2=
=80=99 individually?

Thanks for all of this!

Ludo=E2=80=99.