unofficial mirror of help-guix@gnu.org 
 help / color / mirror / Atom feed
From: Phil Beadling <phil@beadling.co.uk>
To: zimoun <zimon.toutoune@gmail.com>
Cc: help-guix <help-guix@gnu.org>
Subject: Re: python-pyarrow broken for parquet?
Date: Mon, 5 Jul 2021 13:15:45 +0100	[thread overview]
Message-ID: <CAOvsyQv73JDqsKGLjZiVZiTSeVAJVGmdLgb2r=58ONAEjXwvvg@mail.gmail.com> (raw)
In-Reply-To: <CAOvsyQuX6HbKk=uyYGMAui7rDsh03Da-5d48pUdmReXNkHsAWg@mail.gmail.com>

Apologies one tiny correction - the version of python-pandas probably
doesn't need changing - in the package I update to a more recent custom
build of pandas, I'm using - but it's probably not necessary:
("python-pandas" ,python-pandas-simm)

On Mon, 5 Jul 2021 at 13:13, Phil Beadling <phil@beadling.co.uk> wrote:

> As promised - this works for me but the patching of the make files, in
> particular the 2 sed commands is very brittle to any changes in the
> underlying project.  I'm not sure it should go into Guix proper as-is, but
> if people think it's useful I'm happy to submit the patch.
>
> I may try to improve on this when I have moment, but if someone else wants
> to run with it - it's a good starting point at least.
>
> The problem is that the generation of the PARQUET_INCLUDE_DIR nad
> PARQUET_LIB_DIR end up concatentating both the include and lib dirs
> together:
>
> For example debugging the cmake file the value
> "PARQUET_INCLUDE_DIR/parquet" becomes:
> "*/gnu/store/ywklhws3ccb457gsb605z95azbfpsbyl-apache-arrow-3.0.0-lib/*/gnu/store/zzzb4ymfj3igynsflxwxsn58kvnpa6qb-apache-arrow-3.0.0-include/share/include/parquet"
>
>
> The *lib *directory shouldn't be there at all.
>
> --8<---------------cut here---------------start------------->8---
> (define-public python-pyarrow-parquet
>   (package/inherit python-pyarrow
>                    (arguments
>                     (substitute-keyword-arguments (package-arguments
> python-pyarrow)
>                       ((#:phases phases)
>                        `(modify-phases ,phases
>                           (add-before 'install 'patch-cmake-variables
>                             (lambda* (#:key inputs #:allow-other-keys)
>                               ;; Replace cmake locations with hardcoded
> guix links for the underlying C++ lib - this is a pretty awful hack
>                               (invoke "sed" "-i" (string-append
> "1s#^#set(PARQUET_INCLUDE_DIR \"" (assoc-ref inputs "apache-arrow:include")
> "/share/include\
> \")\\n#") "cmake_modules/FindParquet.cmake")
>                               (invoke "sed" "-i" (string-append
> "116s#^#set(PARQUET_LIB_DIR \"" (assoc-ref inputs "apache-arrow:lib")
> "/lib\")\\n#") "cmake\
> _modules/FindParquet.cmake")))
>                           (add-before 'install 'patch-parquet-library
>                             (lambda _
>                               ;; Another nasty hack - there must be a
> better way to change this?
>                               (substitute* "CMakeLists.txt"
> (("parquet_shared") "parquet"))))
>                           (add-before 'install 'set-PYARROW_WITH_PARQUET
>                             (lambda _
>                               (setenv "PYARROW_WITH_PARQUET" "1")
>                               ;;(setenv "VERBOSE" "1") ;; useful debug for
> cmake
>                               #t))))))
>                    ;; we need includes from apache as well as libs for
> parquet
>                    (propagated-inputs
>                     `(("python-pandas" ,python-pandas-simm)
>                       ("apache-arrow:lib" ,apache-arrow "lib")
>                       ("apache-arrow:include" ,apache-arrow "include")
>                       ,@(fold alist-delete (package-propagated-inputs
> python-pyarrow)
>                               '("python-pandas" "apache-arrow"))))))
> --8<---------------cut here---------------end--------------->8---
>
> On Fri, 2 Jul 2021 at 16:34, <phil@beadling.co.uk> wrote:
>
>> Thanks Simon.
>>
>> Yep I got this far too - and I have a candidate fix for building
>> parquet.  But it's tremendously hacky (sed'ing hardcoded variables into the
>> cmake files to trample the derived settings in several places).  It seems
>> to work but needs finessing.  I'll post here shortly, but not sure it's
>> stable enough to be updated in Guix proper.  We can debate that when
>> everyone sees my horrendous fix.
>>
>>
>>

  reply	other threads:[~2021-07-05 12:16 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-30 18:35 python-pyarrow broken for parquet? Phil Beadling
2021-07-02 11:01 ` zimoun
2021-07-02 15:34   ` phil
2021-07-05 12:13     ` Phil Beadling
2021-07-05 12:15       ` Phil Beadling [this message]
2021-07-28 18:10         ` Ricardo Wurmus
2021-07-28 20:37           ` Phil
2021-08-17 12:29             ` zimoun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOvsyQv73JDqsKGLjZiVZiTSeVAJVGmdLgb2r=58ONAEjXwvvg@mail.gmail.com' \
    --to=phil@beadling.co.uk \
    --cc=help-guix@gnu.org \
    --cc=zimon.toutoune@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).