all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* bug#54787: importer Bioconductor: no tarball, only Git
@ 2022-04-08 11:48 zimoun
  2022-04-11 16:15 ` Ricardo Wurmus
  0 siblings, 1 reply; 8+ messages in thread
From: zimoun @ 2022-04-08 11:48 UTC (permalink / raw)
  To: 54787; +Cc: Ricardo Wurmus

Hi,

Consider the package CHETAH, included in Bioconductor release 3.14;

<https://bioconductor.org/packages/release/bioc/html/CHETAH.html>

but then,

--8<---------------cut here---------------start------------->8---
$ guix import cran -a bioconductor CHETAH
guix import: warning: failed to retrieve package information from https://cran.r-project.org/web/packages/CHETAH/DESCRIPTION: 404 (Not Found)
guix import: error: failed to download description for package 'CHETAH'
--8<---------------cut here---------------end--------------->8---

The reason is because there is no source package.  Only the Git source
repo.


Cheers,
simon




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#54787: importer Bioconductor: no tarball, only Git
  2022-04-08 11:48 bug#54787: importer Bioconductor: no tarball, only Git zimoun
@ 2022-04-11 16:15 ` Ricardo Wurmus
  2022-04-12 16:25   ` zimoun
  0 siblings, 1 reply; 8+ messages in thread
From: Ricardo Wurmus @ 2022-04-11 16:15 UTC (permalink / raw)
  To: zimoun; +Cc: 54787


zimoun <zimon.toutoune@gmail.com> writes:

> $ guix import cran -a bioconductor CHETAH
> guix import: warning: failed to retrieve package information from https://cran.r-project.org/web/packages/CHETAH/DESCRIPTION: 404 (Not Found)
> guix import: error: failed to download description for package 'CHETAH'
>
> The reason is because there is no source package.  Only the Git source
> repo.

We should finally switch to fetching the sources from Git.  I wonder why
we haven’t done this earlier.

I guess we should do this gradually to avoid mass updates, so perhaps we
should introduce bioconductor-git-reference and switch over packages one
by one.

What do you think?

-- 
Ricardo




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#54787: importer Bioconductor: no tarball, only Git
  2022-04-11 16:15 ` Ricardo Wurmus
@ 2022-04-12 16:25   ` zimoun
  2022-04-14 11:43     ` Ricardo Wurmus
  0 siblings, 1 reply; 8+ messages in thread
From: zimoun @ 2022-04-12 16:25 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: 54787

Hi Ricardo,

On lun., 11 avril 2022 at 18:15, Ricardo Wurmus <rekado@elephly.net> wrote:
> zimoun <zimon.toutoune@gmail.com> writes:
>
>> $ guix import cran -a bioconductor CHETAH
>> guix import: warning: failed to retrieve package information from
>> https://cran.r-project.org/web/packages/CHETAH/DESCRIPTION: 404 (Not Found)
>> guix import: error: failed to download description for package 'CHETAH'
>>
>> The reason is because there is no source package.  Only the Git source
>> repo.
>
> We should finally switch to fetching the sources from Git.  I wonder why
> we haven’t done this earlier.

Because, maybe, we have just finished the janitor work cleaning the
files cran.scm, bioconductor.scm and bioinformatics.scm. :-)

> I guess we should do this gradually to avoid mass updates, so perhaps we
> should introduce bioconductor-git-reference and switch over packages one
> by one.

First, note that annotations do not have Git repo; at least not always,
e.g.,

<https://bioconductor.org/packages/release/data/annotation/html/GenomeInfoDbData.html>

Second, if we go for something like:

--8<---------------cut here---------------start------------->8---
(define* (bioconductor-git-reference name #:optional
                                     (release %bioconductor-version))
  "Return a <git-reference> for the R package archive on Bioconductor for the
RELEASE corresponding to NAME."
  (git-reference
   (url (string-append %bioconductor-git-url name))
   (commit (string-append "RELEASE_" (string-replace-substring
                                      %bioconductor-version "." "_")))))
--8<---------------cut here---------------end--------------->8---

then, it raises the question: import/cran.scm or build-system/r.scm ?
i.e., do we put a module dependency against (guix git-download) for the
r-build-system or not?

TeXLive already has a dependency to svn-download, so why not.

Well, I am also in favor to break the API and move %bioconductor-version
and %bioconductor-url to (guix build-system r).  WDYT?  It would
simplify some things (#36805 and #39885), I guess.


Third, the adjustments of the importer require a large cup of coffee.


Back to CHETAH, note that

   guix import cran -a git htpps://git.bioconductor.org/CHETAH

works but it points to master instead of RELEASE_3_14.  Well, I am not
very familiar with the Bioconductor workflow for their release.


Last, using this in gnu/packages/bioconductor.scm,

--8<---------------cut here---------------start------------->8---
(define-public r-chetah
  (package
    (name "r-chetah")
    (version "1.11.2")
    (source
     (origin
       (method git-fetch)
       (uri (bioconductor-git-reference "CHETAH"))
       (file-name (git-file-name name version))
       (sha256
        (base32 "021v5831zqdy4pirfsb35kbnz8kmz4lxqc4cwi55qgd6r081xlgh"))))
    (properties `((upstream-name . "CHETAH")))
    (build-system r-build-system)
    (propagated-inputs
     (list r-biodist
           r-corrplot
           r-cowplot
           r-dendextend
           r-ggplot2
           r-gplots
           r-pheatmap
           r-plotly
           r-reshape2
           r-s4vectors
           r-shiny
           r-singlecellexperiment
           r-summarizedexperiment))
    (native-inputs (list r-knitr))
    (home-page "https://git.bioconductor.org/packages/CHETAH")
    (synopsis "Fast and accurate scRNA-seq cell type identification")
    (description
     "CHETAH (CHaracterization of cEll Types Aided by Hierarchical classification) is
an accurate, selective and fast scRNA-seq classifier.  Classification is guided
by a reference dataset, preferentially also a scRNA-seq dataset.  By
hierarchical clustering of the reference data, CHETAH creates a classification
tree that enables a step-wise, top-to-bottom classification.  Using a novel
stopping rule, CHETAH classifies the input cells to the cell types of the
references and to \"intermediate types\": more general classifications that ended
in an intermediate node of the tree.")
    (license #f)))
--8<---------------cut here---------------end--------------->8---

it just builds with,

    ./pre-inst-env guix build r-chetah



WDYT?


Cheers,
simon




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#54787: importer Bioconductor: no tarball, only Git
  2022-04-12 16:25   ` zimoun
@ 2022-04-14 11:43     ` Ricardo Wurmus
  2022-04-14 12:59       ` zimoun
  2022-04-14 14:04       ` Maxime Devos
  0 siblings, 2 replies; 8+ messages in thread
From: Ricardo Wurmus @ 2022-04-14 11:43 UTC (permalink / raw)
  To: zimoun; +Cc: 54787


zimoun <zimon.toutoune@gmail.com> writes:

> First, note that annotations do not have Git repo; at least not always,
> e.g.,
>
> <https://bioconductor.org/packages/release/data/annotation/html/GenomeInfoDbData.html>

That’s fine.  We just ignore annotation and experiment packages, and use
git only for regular packages.

> Second, if we go for something like:
>
> (define* (bioconductor-git-reference name #:optional
>                                      (release %bioconductor-version))
>   "Return a <git-reference> for the R package archive on Bioconductor for the
> RELEASE corresponding to NAME."
>   (git-reference
>    (url (string-append %bioconductor-git-url name))
>    (commit (string-append "RELEASE_" (string-replace-substring
>                                       %bioconductor-version "." "_")))))
>
>
> then, it raises the question: import/cran.scm or build-system/r.scm ?
> i.e., do we put a module dependency against (guix git-download) for the
> r-build-system or not?
>
> TeXLive already has a dependency to svn-download, so why not.

Yes, I don’t think that’s a problem.

We probably should *not* use RELEASE_3_14 (or whatever) as the commit,
though, because that is a moving target.  We need to resolve to the
actual commit and use its hash.

I wonder how the updater would need to be changed.  It would need to
know about the release branch and look for new commits in that branch
only.

> Well, I am also in favor to break the API and move %bioconductor-version
> and %bioconductor-url to (guix build-system r).  WDYT?  It would
> simplify some things (#36805 and #39885), I guess.

We tried this before and we couldn’t do this because of a circular
reference.

> Back to CHETAH, note that
>
>    guix import cran -a git htpps://git.bioconductor.org/CHETAH
>
> works but it points to master instead of RELEASE_3_14.  Well, I am not
> very familiar with the Bioconductor workflow for their release.

That’s because the importer doesn’t let us specify a different branch.
We should add that, but it’s strictly separate from the migration we’re
about to embark on.

> Last, using this in gnu/packages/bioconductor.scm,
>
> (define-public r-chetah
>   (package
>     (name "r-chetah")
>     (version "1.11.2")
>     (source
>      (origin
>        (method git-fetch)
>        (uri (bioconductor-git-reference "CHETAH"))
>        (file-name (git-file-name name version))
>        (sha256
>         (base32 "021v5831zqdy4pirfsb35kbnz8kmz4lxqc4cwi55qgd6r081xlgh"))))
>     (properties `((upstream-name . "CHETAH")))
>     (build-system r-build-system)
>     (propagated-inputs
>      (list r-biodist
>            r-corrplot
>            r-cowplot
>            r-dendextend
>            r-ggplot2
>            r-gplots
>            r-pheatmap
>            r-plotly
>            r-reshape2
>            r-s4vectors
>            r-shiny
>            r-singlecellexperiment
>            r-summarizedexperiment))
>     (native-inputs (list r-knitr))
>     (home-page "https://git.bioconductor.org/packages/CHETAH")
>     (synopsis "Fast and accurate scRNA-seq cell type identification")
>     (description
>      "CHETAH (CHaracterization of cEll Types Aided by Hierarchical classification) is
> an accurate, selective and fast scRNA-seq classifier.  Classification is guided
> by a reference dataset, preferentially also a scRNA-seq dataset.  By
> hierarchical clustering of the reference data, CHETAH creates a classification
> tree that enables a step-wise, top-to-bottom classification.  Using a novel
> stopping rule, CHETAH classifies the input cells to the cell types of the
> references and to \"intermediate types\": more general classifications that ended
> in an intermediate node of the tree.")
>     (license #f)))
>
> it just builds with,
>
>     ./pre-inst-env guix build r-chetah
>
>
>
> WDYT?

Neat :)

-- 
Ricardo




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#54787: importer Bioconductor: no tarball, only Git
  2022-04-14 11:43     ` Ricardo Wurmus
@ 2022-04-14 12:59       ` zimoun
  2022-04-14 13:57         ` Ricardo Wurmus
  2022-04-14 14:04       ` Maxime Devos
  1 sibling, 1 reply; 8+ messages in thread
From: zimoun @ 2022-04-14 12:59 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: 54787

Hi Ricardo,

On Thu, 14 Apr 2022 at 13:43, Ricardo Wurmus <rekado@elephly.net> wrote:

> We probably should *not* use RELEASE_3_14 (or whatever) as the commit,
> though, because that is a moving target.  We need to resolve to the
> actual commit and use its hash.
>
> I wonder how the updater would need to be changed.  It would need to
> know about the release branch and look for new commits in that branch
> only.

To be honest, I have not checked the Bioconductor documentation about
their Git repo structure.  What I see is:

--8<---------------cut here---------------start------------->8---
$ git clone https://git.bioconductor.org/packages/CHETAH
$ cd CHETAH
$ git branch -av
* master                      5d5f5df [origin/master] Pass serialized S4 instances thru updateObject()
  remotes/origin/HEAD         -> origin/master
  remotes/origin/RELEASE_3_10 063de2d bump x.y.z version to even y prior to creation of RELEASE_3_10 branch
  remotes/origin/RELEASE_3_11 701ca7f bump x.y.z version to even y prior to creation of RELEASE_3_11 branch
  remotes/origin/RELEASE_3_12 cd3dd78 bump x.y.z version to even y prior to creation of RELEASE_3_12 branch
  remotes/origin/RELEASE_3_13 1eacdb8 bump x.y.z version to even y prior to creation of RELEASE_3_13 branch
  remotes/origin/RELEASE_3_14 03295c9 bump x.y.z version to even y prior to creation of RELEASE_3_14 branch
  remotes/origin/RELEASE_3_9  22b53f2 version bump
  remotes/origin/master       5d5f5df Pass serialized S4 instances thru updateObject()
--8<---------------cut here---------------end--------------->8---


Do we follow ’master’?  Is it a mirror of what Bioconductor names their
3.14 release?

My guess was that RELEASE_3_14 mirrors their 3.14 release.


>> Well, I am also in favor to break the API and move %bioconductor-version
>> and %bioconductor-url to (guix build-system r).  WDYT?  It would
>> simplify some things (#36805 and #39885), I guess.
>
> We tried this before and we couldn’t do this because of a circular
> reference.

Well, I have something that works.  So I do not know if this circular
reference is still there.



> That’s because the importer doesn’t let us specify a different branch.
> We should add that, but it’s strictly separate from the migration we’re
> about to embark on.

I am not familiar with the updater (guix refresh -u).  My plan is:

 1. Add bioconductor-git-reference
 2. Adapt the bioconductor importer.
 3. Updater?

The question is: do we have to include the migration in the updater?  Or
do we do the migration by custom scripts?


Note that, because we do not support shallow clones, the complete
sources will be a bit bigger; since they contain all the Bioconductor
history of all the packages.


Cheers,
simon





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#54787: importer Bioconductor: no tarball, only Git
  2022-04-14 12:59       ` zimoun
@ 2022-04-14 13:57         ` Ricardo Wurmus
  2022-04-14 15:03           ` zimoun
  0 siblings, 1 reply; 8+ messages in thread
From: Ricardo Wurmus @ 2022-04-14 13:57 UTC (permalink / raw)
  To: zimoun; +Cc: 54787


zimoun <zimon.toutoune@gmail.com> writes:

> On Thu, 14 Apr 2022 at 13:43, Ricardo Wurmus <rekado@elephly.net> wrote:
>
>> We probably should *not* use RELEASE_3_14 (or whatever) as the commit,
>> though, because that is a moving target.  We need to resolve to the
>> actual commit and use its hash.
>>
>> I wonder how the updater would need to be changed.  It would need to
>> know about the release branch and look for new commits in that branch
>> only.
>
> To be honest, I have not checked the Bioconductor documentation about
> their Git repo structure.  What I see is:
>
> $ git clone https://git.bioconductor.org/packages/CHETAH
> $ cd CHETAH
> $ git branch -av
> * master                      5d5f5df [origin/master] Pass serialized S4 instances thru updateObject()
>   remotes/origin/HEAD         -> origin/master
>   remotes/origin/RELEASE_3_10 063de2d bump x.y.z version to even y prior to creation of RELEASE_3_10 branch
>   remotes/origin/RELEASE_3_11 701ca7f bump x.y.z version to even y prior to creation of RELEASE_3_11 branch
>   remotes/origin/RELEASE_3_12 cd3dd78 bump x.y.z version to even y prior to creation of RELEASE_3_12 branch
>   remotes/origin/RELEASE_3_13 1eacdb8 bump x.y.z version to even y prior to creation of RELEASE_3_13 branch
>   remotes/origin/RELEASE_3_14 03295c9 bump x.y.z version to even y prior to creation of RELEASE_3_14 branch
>   remotes/origin/RELEASE_3_9  22b53f2 version bump
>   remotes/origin/master       5d5f5df Pass serialized S4 instances thru updateObject()
>
>
> Do we follow ’master’?  Is it a mirror of what Bioconductor names their
> 3.14 release?

We should not follow “master”.  That’s the development branch.  We
should follow the current release branch.

> My guess was that RELEASE_3_14 mirrors their 3.14 release.

Correct.

>>> Well, I am also in favor to break the API and move %bioconductor-version
>>> and %bioconductor-url to (guix build-system r).  WDYT?  It would
>>> simplify some things (#36805 and #39885), I guess.
>>
>> We tried this before and we couldn’t do this because of a circular
>> reference.
>
> Well, I have something that works.  So I do not know if this circular
> reference is still there.

If “make as-derivation” does not fail it is probably okay.

>> That’s because the importer doesn’t let us specify a different branch.
>> We should add that, but it’s strictly separate from the migration we’re
>> about to embark on.
>
> I am not familiar with the updater (guix refresh -u).  My plan is:
>
>  1. Add bioconductor-git-reference
>  2. Adapt the bioconductor importer.
>  3. Updater?

The updater is closely connected to the importer.  It just needs to be
told how it can find new releases.

> The question is: do we have to include the migration in the updater?  Or
> do we do the migration by custom scripts?

We can do the migration manually.  But if we end up with a broken
updater I won’t be able to update Bioconductor packages in bulk; that
would be a serious problem for future maintenance.

> Note that, because we do not support shallow clones, the complete
> sources will be a bit bigger; since they contain all the Bioconductor
> history of all the packages.

Doesn’t Guile-Git support shallow clones?  In any case, this should not
be an obstacle for us.  Ensuring long-term reproducibility is more
important than space savings.

-- 
Ricardo




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#54787: importer Bioconductor: no tarball, only Git
  2022-04-14 11:43     ` Ricardo Wurmus
  2022-04-14 12:59       ` zimoun
@ 2022-04-14 14:04       ` Maxime Devos
  1 sibling, 0 replies; 8+ messages in thread
From: Maxime Devos @ 2022-04-14 14:04 UTC (permalink / raw)
  To: Ricardo Wurmus, zimoun; +Cc: 54787

[-- Attachment #1: Type: text/plain, Size: 546 bytes --]

Ricardo Wurmus schreef op do 14-04-2022 om 13:43 [+0200]:
> I wonder how the updater would need to be changed.  It would need to
> know about the release branch and look for new commits in that branch
> only.

Perhaps <https://issues.guix.gnu.org/53144> would be useful?  It adds a
'latest-git-updater' refresher that looks in a branch (or more
generally, any reference, so in principle a tag that is repeatedly
replaced would work as well) for the latest commit.  There are some
unaddressed comments though ...

Greetings,
Maxime.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 260 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#54787: importer Bioconductor: no tarball, only Git
  2022-04-14 13:57         ` Ricardo Wurmus
@ 2022-04-14 15:03           ` zimoun
  0 siblings, 0 replies; 8+ messages in thread
From: zimoun @ 2022-04-14 15:03 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: 54787


On Thu, 14 Apr 2022 at 15:57, Ricardo Wurmus <rekado@elephly.net> wrote:
> zimoun <zimon.toutoune@gmail.com> writes:
>
>> On Thu, 14 Apr 2022 at 13:43, Ricardo Wurmus <rekado@elephly.net> wrote:
>>
>>> We probably should *not* use RELEASE_3_14 (or whatever) as the commit,
>>> though, because that is a moving target.  We need to resolve to the
>>> actual commit and use its hash.

[...]

>> Do we follow ’master’?  Is it a mirror of what Bioconductor names their
>> 3.14 release?
>
> We should not follow “master”.  That’s the development branch.  We
> should follow the current release branch.

To be sure to well understand you, you point is to have something like:

--8<---------------cut here---------------start------------->8---
  (define* (bioconductor-git-reference name #:key commit)
    (git-reference
     (url (string-append %bioconductor-git-url name))
     (commit commit))))
--8<---------------cut here---------------end--------------->8---

with an explicit commit for each package definition, right?


> Doesn’t Guile-Git support shallow clones?  In any case, this should not
> be an obstacle for us.  Ensuring long-term reproducibility is more
> important than space savings.

No, since libgit2 does not support it, IIUC.

<https://github.com/libgit2/libgit2/issues/3058>


Cheers,
simon




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-04-14 15:11 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-04-08 11:48 bug#54787: importer Bioconductor: no tarball, only Git zimoun
2022-04-11 16:15 ` Ricardo Wurmus
2022-04-12 16:25   ` zimoun
2022-04-14 11:43     ` Ricardo Wurmus
2022-04-14 12:59       ` zimoun
2022-04-14 13:57         ` Ricardo Wurmus
2022-04-14 15:03           ` zimoun
2022-04-14 14:04       ` Maxime Devos

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.