* bug#54787: importer Bioconductor: no tarball, only Git @ 2022-04-08 11:48 zimoun 2022-04-11 16:15 ` Ricardo Wurmus 0 siblings, 1 reply; 8+ messages in thread From: zimoun @ 2022-04-08 11:48 UTC (permalink / raw) To: 54787; +Cc: Ricardo Wurmus Hi, Consider the package CHETAH, included in Bioconductor release 3.14; <https://bioconductor.org/packages/release/bioc/html/CHETAH.html> but then, --8<---------------cut here---------------start------------->8--- $ guix import cran -a bioconductor CHETAH guix import: warning: failed to retrieve package information from https://cran.r-project.org/web/packages/CHETAH/DESCRIPTION: 404 (Not Found) guix import: error: failed to download description for package 'CHETAH' --8<---------------cut here---------------end--------------->8--- The reason is because there is no source package. Only the Git source repo. Cheers, simon ^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#54787: importer Bioconductor: no tarball, only Git 2022-04-08 11:48 bug#54787: importer Bioconductor: no tarball, only Git zimoun @ 2022-04-11 16:15 ` Ricardo Wurmus 2022-04-12 16:25 ` zimoun 0 siblings, 1 reply; 8+ messages in thread From: Ricardo Wurmus @ 2022-04-11 16:15 UTC (permalink / raw) To: zimoun; +Cc: 54787 zimoun <zimon.toutoune@gmail.com> writes: > $ guix import cran -a bioconductor CHETAH > guix import: warning: failed to retrieve package information from https://cran.r-project.org/web/packages/CHETAH/DESCRIPTION: 404 (Not Found) > guix import: error: failed to download description for package 'CHETAH' > > The reason is because there is no source package. Only the Git source > repo. We should finally switch to fetching the sources from Git. I wonder why we haven’t done this earlier. I guess we should do this gradually to avoid mass updates, so perhaps we should introduce bioconductor-git-reference and switch over packages one by one. What do you think? -- Ricardo ^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#54787: importer Bioconductor: no tarball, only Git 2022-04-11 16:15 ` Ricardo Wurmus @ 2022-04-12 16:25 ` zimoun 2022-04-14 11:43 ` Ricardo Wurmus 0 siblings, 1 reply; 8+ messages in thread From: zimoun @ 2022-04-12 16:25 UTC (permalink / raw) To: Ricardo Wurmus; +Cc: 54787 Hi Ricardo, On lun., 11 avril 2022 at 18:15, Ricardo Wurmus <rekado@elephly.net> wrote: > zimoun <zimon.toutoune@gmail.com> writes: > >> $ guix import cran -a bioconductor CHETAH >> guix import: warning: failed to retrieve package information from >> https://cran.r-project.org/web/packages/CHETAH/DESCRIPTION: 404 (Not Found) >> guix import: error: failed to download description for package 'CHETAH' >> >> The reason is because there is no source package. Only the Git source >> repo. > > We should finally switch to fetching the sources from Git. I wonder why > we haven’t done this earlier. Because, maybe, we have just finished the janitor work cleaning the files cran.scm, bioconductor.scm and bioinformatics.scm. :-) > I guess we should do this gradually to avoid mass updates, so perhaps we > should introduce bioconductor-git-reference and switch over packages one > by one. First, note that annotations do not have Git repo; at least not always, e.g., <https://bioconductor.org/packages/release/data/annotation/html/GenomeInfoDbData.html> Second, if we go for something like: --8<---------------cut here---------------start------------->8--- (define* (bioconductor-git-reference name #:optional (release %bioconductor-version)) "Return a <git-reference> for the R package archive on Bioconductor for the RELEASE corresponding to NAME." (git-reference (url (string-append %bioconductor-git-url name)) (commit (string-append "RELEASE_" (string-replace-substring %bioconductor-version "." "_"))))) --8<---------------cut here---------------end--------------->8--- then, it raises the question: import/cran.scm or build-system/r.scm ? i.e., do we put a module dependency against (guix git-download) for the r-build-system or not? TeXLive already has a dependency to svn-download, so why not. Well, I am also in favor to break the API and move %bioconductor-version and %bioconductor-url to (guix build-system r). WDYT? It would simplify some things (#36805 and #39885), I guess. Third, the adjustments of the importer require a large cup of coffee. Back to CHETAH, note that guix import cran -a git htpps://git.bioconductor.org/CHETAH works but it points to master instead of RELEASE_3_14. Well, I am not very familiar with the Bioconductor workflow for their release. Last, using this in gnu/packages/bioconductor.scm, --8<---------------cut here---------------start------------->8--- (define-public r-chetah (package (name "r-chetah") (version "1.11.2") (source (origin (method git-fetch) (uri (bioconductor-git-reference "CHETAH")) (file-name (git-file-name name version)) (sha256 (base32 "021v5831zqdy4pirfsb35kbnz8kmz4lxqc4cwi55qgd6r081xlgh")))) (properties `((upstream-name . "CHETAH"))) (build-system r-build-system) (propagated-inputs (list r-biodist r-corrplot r-cowplot r-dendextend r-ggplot2 r-gplots r-pheatmap r-plotly r-reshape2 r-s4vectors r-shiny r-singlecellexperiment r-summarizedexperiment)) (native-inputs (list r-knitr)) (home-page "https://git.bioconductor.org/packages/CHETAH") (synopsis "Fast and accurate scRNA-seq cell type identification") (description "CHETAH (CHaracterization of cEll Types Aided by Hierarchical classification) is an accurate, selective and fast scRNA-seq classifier. Classification is guided by a reference dataset, preferentially also a scRNA-seq dataset. By hierarchical clustering of the reference data, CHETAH creates a classification tree that enables a step-wise, top-to-bottom classification. Using a novel stopping rule, CHETAH classifies the input cells to the cell types of the references and to \"intermediate types\": more general classifications that ended in an intermediate node of the tree.") (license #f))) --8<---------------cut here---------------end--------------->8--- it just builds with, ./pre-inst-env guix build r-chetah WDYT? Cheers, simon ^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#54787: importer Bioconductor: no tarball, only Git 2022-04-12 16:25 ` zimoun @ 2022-04-14 11:43 ` Ricardo Wurmus 2022-04-14 12:59 ` zimoun 2022-04-14 14:04 ` Maxime Devos 0 siblings, 2 replies; 8+ messages in thread From: Ricardo Wurmus @ 2022-04-14 11:43 UTC (permalink / raw) To: zimoun; +Cc: 54787 zimoun <zimon.toutoune@gmail.com> writes: > First, note that annotations do not have Git repo; at least not always, > e.g., > > <https://bioconductor.org/packages/release/data/annotation/html/GenomeInfoDbData.html> That’s fine. We just ignore annotation and experiment packages, and use git only for regular packages. > Second, if we go for something like: > > (define* (bioconductor-git-reference name #:optional > (release %bioconductor-version)) > "Return a <git-reference> for the R package archive on Bioconductor for the > RELEASE corresponding to NAME." > (git-reference > (url (string-append %bioconductor-git-url name)) > (commit (string-append "RELEASE_" (string-replace-substring > %bioconductor-version "." "_"))))) > > > then, it raises the question: import/cran.scm or build-system/r.scm ? > i.e., do we put a module dependency against (guix git-download) for the > r-build-system or not? > > TeXLive already has a dependency to svn-download, so why not. Yes, I don’t think that’s a problem. We probably should *not* use RELEASE_3_14 (or whatever) as the commit, though, because that is a moving target. We need to resolve to the actual commit and use its hash. I wonder how the updater would need to be changed. It would need to know about the release branch and look for new commits in that branch only. > Well, I am also in favor to break the API and move %bioconductor-version > and %bioconductor-url to (guix build-system r). WDYT? It would > simplify some things (#36805 and #39885), I guess. We tried this before and we couldn’t do this because of a circular reference. > Back to CHETAH, note that > > guix import cran -a git htpps://git.bioconductor.org/CHETAH > > works but it points to master instead of RELEASE_3_14. Well, I am not > very familiar with the Bioconductor workflow for their release. That’s because the importer doesn’t let us specify a different branch. We should add that, but it’s strictly separate from the migration we’re about to embark on. > Last, using this in gnu/packages/bioconductor.scm, > > (define-public r-chetah > (package > (name "r-chetah") > (version "1.11.2") > (source > (origin > (method git-fetch) > (uri (bioconductor-git-reference "CHETAH")) > (file-name (git-file-name name version)) > (sha256 > (base32 "021v5831zqdy4pirfsb35kbnz8kmz4lxqc4cwi55qgd6r081xlgh")))) > (properties `((upstream-name . "CHETAH"))) > (build-system r-build-system) > (propagated-inputs > (list r-biodist > r-corrplot > r-cowplot > r-dendextend > r-ggplot2 > r-gplots > r-pheatmap > r-plotly > r-reshape2 > r-s4vectors > r-shiny > r-singlecellexperiment > r-summarizedexperiment)) > (native-inputs (list r-knitr)) > (home-page "https://git.bioconductor.org/packages/CHETAH") > (synopsis "Fast and accurate scRNA-seq cell type identification") > (description > "CHETAH (CHaracterization of cEll Types Aided by Hierarchical classification) is > an accurate, selective and fast scRNA-seq classifier. Classification is guided > by a reference dataset, preferentially also a scRNA-seq dataset. By > hierarchical clustering of the reference data, CHETAH creates a classification > tree that enables a step-wise, top-to-bottom classification. Using a novel > stopping rule, CHETAH classifies the input cells to the cell types of the > references and to \"intermediate types\": more general classifications that ended > in an intermediate node of the tree.") > (license #f))) > > it just builds with, > > ./pre-inst-env guix build r-chetah > > > > WDYT? Neat :) -- Ricardo ^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#54787: importer Bioconductor: no tarball, only Git 2022-04-14 11:43 ` Ricardo Wurmus @ 2022-04-14 12:59 ` zimoun 2022-04-14 13:57 ` Ricardo Wurmus 2022-04-14 14:04 ` Maxime Devos 1 sibling, 1 reply; 8+ messages in thread From: zimoun @ 2022-04-14 12:59 UTC (permalink / raw) To: Ricardo Wurmus; +Cc: 54787 Hi Ricardo, On Thu, 14 Apr 2022 at 13:43, Ricardo Wurmus <rekado@elephly.net> wrote: > We probably should *not* use RELEASE_3_14 (or whatever) as the commit, > though, because that is a moving target. We need to resolve to the > actual commit and use its hash. > > I wonder how the updater would need to be changed. It would need to > know about the release branch and look for new commits in that branch > only. To be honest, I have not checked the Bioconductor documentation about their Git repo structure. What I see is: --8<---------------cut here---------------start------------->8--- $ git clone https://git.bioconductor.org/packages/CHETAH $ cd CHETAH $ git branch -av * master 5d5f5df [origin/master] Pass serialized S4 instances thru updateObject() remotes/origin/HEAD -> origin/master remotes/origin/RELEASE_3_10 063de2d bump x.y.z version to even y prior to creation of RELEASE_3_10 branch remotes/origin/RELEASE_3_11 701ca7f bump x.y.z version to even y prior to creation of RELEASE_3_11 branch remotes/origin/RELEASE_3_12 cd3dd78 bump x.y.z version to even y prior to creation of RELEASE_3_12 branch remotes/origin/RELEASE_3_13 1eacdb8 bump x.y.z version to even y prior to creation of RELEASE_3_13 branch remotes/origin/RELEASE_3_14 03295c9 bump x.y.z version to even y prior to creation of RELEASE_3_14 branch remotes/origin/RELEASE_3_9 22b53f2 version bump remotes/origin/master 5d5f5df Pass serialized S4 instances thru updateObject() --8<---------------cut here---------------end--------------->8--- Do we follow ’master’? Is it a mirror of what Bioconductor names their 3.14 release? My guess was that RELEASE_3_14 mirrors their 3.14 release. >> Well, I am also in favor to break the API and move %bioconductor-version >> and %bioconductor-url to (guix build-system r). WDYT? It would >> simplify some things (#36805 and #39885), I guess. > > We tried this before and we couldn’t do this because of a circular > reference. Well, I have something that works. So I do not know if this circular reference is still there. > That’s because the importer doesn’t let us specify a different branch. > We should add that, but it’s strictly separate from the migration we’re > about to embark on. I am not familiar with the updater (guix refresh -u). My plan is: 1. Add bioconductor-git-reference 2. Adapt the bioconductor importer. 3. Updater? The question is: do we have to include the migration in the updater? Or do we do the migration by custom scripts? Note that, because we do not support shallow clones, the complete sources will be a bit bigger; since they contain all the Bioconductor history of all the packages. Cheers, simon ^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#54787: importer Bioconductor: no tarball, only Git 2022-04-14 12:59 ` zimoun @ 2022-04-14 13:57 ` Ricardo Wurmus 2022-04-14 15:03 ` zimoun 0 siblings, 1 reply; 8+ messages in thread From: Ricardo Wurmus @ 2022-04-14 13:57 UTC (permalink / raw) To: zimoun; +Cc: 54787 zimoun <zimon.toutoune@gmail.com> writes: > On Thu, 14 Apr 2022 at 13:43, Ricardo Wurmus <rekado@elephly.net> wrote: > >> We probably should *not* use RELEASE_3_14 (or whatever) as the commit, >> though, because that is a moving target. We need to resolve to the >> actual commit and use its hash. >> >> I wonder how the updater would need to be changed. It would need to >> know about the release branch and look for new commits in that branch >> only. > > To be honest, I have not checked the Bioconductor documentation about > their Git repo structure. What I see is: > > $ git clone https://git.bioconductor.org/packages/CHETAH > $ cd CHETAH > $ git branch -av > * master 5d5f5df [origin/master] Pass serialized S4 instances thru updateObject() > remotes/origin/HEAD -> origin/master > remotes/origin/RELEASE_3_10 063de2d bump x.y.z version to even y prior to creation of RELEASE_3_10 branch > remotes/origin/RELEASE_3_11 701ca7f bump x.y.z version to even y prior to creation of RELEASE_3_11 branch > remotes/origin/RELEASE_3_12 cd3dd78 bump x.y.z version to even y prior to creation of RELEASE_3_12 branch > remotes/origin/RELEASE_3_13 1eacdb8 bump x.y.z version to even y prior to creation of RELEASE_3_13 branch > remotes/origin/RELEASE_3_14 03295c9 bump x.y.z version to even y prior to creation of RELEASE_3_14 branch > remotes/origin/RELEASE_3_9 22b53f2 version bump > remotes/origin/master 5d5f5df Pass serialized S4 instances thru updateObject() > > > Do we follow ’master’? Is it a mirror of what Bioconductor names their > 3.14 release? We should not follow “master”. That’s the development branch. We should follow the current release branch. > My guess was that RELEASE_3_14 mirrors their 3.14 release. Correct. >>> Well, I am also in favor to break the API and move %bioconductor-version >>> and %bioconductor-url to (guix build-system r). WDYT? It would >>> simplify some things (#36805 and #39885), I guess. >> >> We tried this before and we couldn’t do this because of a circular >> reference. > > Well, I have something that works. So I do not know if this circular > reference is still there. If “make as-derivation” does not fail it is probably okay. >> That’s because the importer doesn’t let us specify a different branch. >> We should add that, but it’s strictly separate from the migration we’re >> about to embark on. > > I am not familiar with the updater (guix refresh -u). My plan is: > > 1. Add bioconductor-git-reference > 2. Adapt the bioconductor importer. > 3. Updater? The updater is closely connected to the importer. It just needs to be told how it can find new releases. > The question is: do we have to include the migration in the updater? Or > do we do the migration by custom scripts? We can do the migration manually. But if we end up with a broken updater I won’t be able to update Bioconductor packages in bulk; that would be a serious problem for future maintenance. > Note that, because we do not support shallow clones, the complete > sources will be a bit bigger; since they contain all the Bioconductor > history of all the packages. Doesn’t Guile-Git support shallow clones? In any case, this should not be an obstacle for us. Ensuring long-term reproducibility is more important than space savings. -- Ricardo ^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#54787: importer Bioconductor: no tarball, only Git 2022-04-14 13:57 ` Ricardo Wurmus @ 2022-04-14 15:03 ` zimoun 0 siblings, 0 replies; 8+ messages in thread From: zimoun @ 2022-04-14 15:03 UTC (permalink / raw) To: Ricardo Wurmus; +Cc: 54787 On Thu, 14 Apr 2022 at 15:57, Ricardo Wurmus <rekado@elephly.net> wrote: > zimoun <zimon.toutoune@gmail.com> writes: > >> On Thu, 14 Apr 2022 at 13:43, Ricardo Wurmus <rekado@elephly.net> wrote: >> >>> We probably should *not* use RELEASE_3_14 (or whatever) as the commit, >>> though, because that is a moving target. We need to resolve to the >>> actual commit and use its hash. [...] >> Do we follow ’master’? Is it a mirror of what Bioconductor names their >> 3.14 release? > > We should not follow “master”. That’s the development branch. We > should follow the current release branch. To be sure to well understand you, you point is to have something like: --8<---------------cut here---------------start------------->8--- (define* (bioconductor-git-reference name #:key commit) (git-reference (url (string-append %bioconductor-git-url name)) (commit commit)))) --8<---------------cut here---------------end--------------->8--- with an explicit commit for each package definition, right? > Doesn’t Guile-Git support shallow clones? In any case, this should not > be an obstacle for us. Ensuring long-term reproducibility is more > important than space savings. No, since libgit2 does not support it, IIUC. <https://github.com/libgit2/libgit2/issues/3058> Cheers, simon ^ permalink raw reply [flat|nested] 8+ messages in thread
* bug#54787: importer Bioconductor: no tarball, only Git 2022-04-14 11:43 ` Ricardo Wurmus 2022-04-14 12:59 ` zimoun @ 2022-04-14 14:04 ` Maxime Devos 1 sibling, 0 replies; 8+ messages in thread From: Maxime Devos @ 2022-04-14 14:04 UTC (permalink / raw) To: Ricardo Wurmus, zimoun; +Cc: 54787 [-- Attachment #1: Type: text/plain, Size: 546 bytes --] Ricardo Wurmus schreef op do 14-04-2022 om 13:43 [+0200]: > I wonder how the updater would need to be changed. It would need to > know about the release branch and look for new commits in that branch > only. Perhaps <https://issues.guix.gnu.org/53144> would be useful? It adds a 'latest-git-updater' refresher that looks in a branch (or more generally, any reference, so in principle a tag that is repeatedly replaced would work as well) for the latest commit. There are some unaddressed comments though ... Greetings, Maxime. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 260 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-04-14 15:11 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-04-08 11:48 bug#54787: importer Bioconductor: no tarball, only Git zimoun 2022-04-11 16:15 ` Ricardo Wurmus 2022-04-12 16:25 ` zimoun 2022-04-14 11:43 ` Ricardo Wurmus 2022-04-14 12:59 ` zimoun 2022-04-14 13:57 ` Ricardo Wurmus 2022-04-14 15:03 ` zimoun 2022-04-14 14:04 ` Maxime Devos
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/guix.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.