unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Google Summer of Code 2023 Inquiry
@ 2023-03-07  1:31 Spencer Skylar Chan
  2023-03-11 13:32 ` Simon Tournier
  0 siblings, 1 reply; 18+ messages in thread
From: Spencer Skylar Chan @ 2023-03-07  1:31 UTC (permalink / raw)
  To: guix-devel

Hello Guix,

I'm a computer science major at University of Maryland and I'm 
interested in contributing to Guix through Google Summer of Code.

I've done bioinformatics research on RNA sequences using R, Python, and 
Bash. I have some experience with Racket, Rust, C, and Java as well.

I've been running the Guix package manager with Arch Linux on my work 
computer for 1 year and Guix system on my non-work computer for 1/2 
year. I've contributed some package upgrades to Guix with this email, 
and several more anonymously.

Here are some project ideas that I am considering for my proposal:

- Creating Guix manifests from `conda env export`

This was proposed in a prior email. Guix has speed and reproducibility 
benefits over Conda, and this project would ease the Conda user's 
transition to Guix.

- Project: Robustify long-term support for Reproducible Research

This was listed on the 2023 GSoC page. Besides being interested in time 
travel, sometimes I find that Conda does not retain old versions of some 
packages, so ensuring robustness for Guix packages would be great.

- Translating Snakemake to Guix Workflow Language (GWL)

This is not exactly related to Guix, but I have written some Snakemake 
workflows and my first impression of GWL is that it looks much cleaner 
than Snakemake. A workflow translator would help the Snakemake user 
transition to GWL.

Are there any issues to look at to start exploring these topics?

Thanks,
Skylar



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Google Summer of Code 2023 Inquiry
@ 2023-03-08  2:33 Spencer Skylar Chan
  0 siblings, 0 replies; 18+ messages in thread
From: Spencer Skylar Chan @ 2023-03-08  2:33 UTC (permalink / raw)
  To: guix-devel

Hello Guix,

I'm a computer science major at University of Maryland and I'm 
interested in contributing to Guix through Google Summer of Code.

I've done bioinformatics research on RNA sequences using R, Python, and 
Bash. I have some experience with Racket, Rust, C, and Java as well.

I've been running the Guix package manager with Arch Linux on my work 
computer for 1 year and Guix system on my non-work computer for 1/2 
year. I've contributed some package upgrades to Guix with this email, 
and several more anonymously.

Here are some project ideas that I am considering for my proposal:

- Creating Guix manifests from `conda env export`

This was proposed in a prior email. Guix has speed and reproducibility 
benefits over Conda, and this project would ease the Conda user's 
transition to Guix.

- Project: Robustify long-term support for Reproducible Research

This was listed on the 2023 GSoC page. Besides being interested in time 
travel, sometimes I find that Conda does not retain old versions of some 
packages, so ensuring robustness for Guix packages would be great.

- Translating Snakemake to Guix Workflow Language (GWL)

This is not exactly related to Guix, but I have written some Snakemake 
workflows and my first impression of GWL is that it looks much cleaner 
than Snakemake. A workflow translator would help the Snakemake user 
transition to GWL.

Are there any issues to look at to start exploring these topics?

Thanks,
Skylar



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Google Summer of Code 2023 Inquiry
  2023-03-07  1:31 Google Summer of Code 2023 Inquiry Spencer Skylar Chan
@ 2023-03-11 13:32 ` Simon Tournier
  2023-03-14 10:10   ` Simon Tournier
                     ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Simon Tournier @ 2023-03-11 13:32 UTC (permalink / raw)
  To: Spencer Skylar Chan, guix-devel; +Cc: Ricardo Wurmus, Kyle Andrews

Hi Skylar,

CC: Ricardo and Kyle

On Mon, 06 Mar 2023 at 20:31, Spencer Skylar Chan <schan12@terpmail.umd.edu> wrote:

> I've been running the Guix package manager with Arch Linux on my work 
> computer for 1 year and Guix system on my non-work computer for 1/2 
> year. I've contributed some package upgrades to Guix with this email, 
> and several more anonymously.

Cool!

Thanks for your interest.

If you can become familiar with the details, I can co-mentor some—which
means we’d need to find a co-mentor.  :-)


> - Creating Guix manifests from `conda env export`

It is not clear for me what would be the path to tackle this.


> - Project: Robustify long-term support for Reproducible Research

Well, from my point of view, a good way for diving into is by
investigating #51726 [1] #48540 [2] and probably more in the bug
tracker.

1: http://issues.guix.gnu.org/issue/51726
2: http://issues.guix.gnu.org/issue/48540


> - Translating Snakemake to Guix Workflow Language (GWL)

Ricardo, maybe you would have some suggestions. :-)


Feel free to ask more questions in guix-devel or guix-science or
gwl-devel mailing lists or IRC libera.chat #guix or #guix-hpc.

Cheers,
simon


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Google Summer of Code 2023 Inquiry
  2023-03-11 13:32 ` Simon Tournier
@ 2023-03-14 10:10   ` Simon Tournier
  2023-03-22 17:41   ` Spencer Skylar Chan
  2023-03-22 18:19   ` Ricardo Wurmus
  2 siblings, 0 replies; 18+ messages in thread
From: Simon Tournier @ 2023-03-14 10:10 UTC (permalink / raw)
  To: Spencer Skylar Chan, guix-devel; +Cc: Ricardo Wurmus, Kyle Andrews

Hi,

On Sat, 11 Mar 2023 at 14:32, Simon Tournier <zimon.toutoune@gmail.com> wrote:

>                                     and probably more in the bug
> tracker.

For instance, you might be interested by:

    https://issues.guix.gnu.org/issue/43442#9
    https://issues.guix.gnu.org/issue/43442#11

as discussed in the recent message [1].

1: https://lists.gnu.org/archive/html/guix-devel/2023-03/msg00175.html

Feel free to ask questions in guix-devel mailing list or IRC libera.chat
#guix or #guix-hpc.

Cheers,
simon



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Google Summer of Code 2023 Inquiry
  2023-03-11 13:32 ` Simon Tournier
  2023-03-14 10:10   ` Simon Tournier
@ 2023-03-22 17:41   ` Spencer Skylar Chan
  2023-03-22 18:19   ` Ricardo Wurmus
  2 siblings, 0 replies; 18+ messages in thread
From: Spencer Skylar Chan @ 2023-03-22 17:41 UTC (permalink / raw)
  To: Simon Tournier, guix-devel; +Cc: Ricardo Wurmus, Kyle Andrews

Hi Simon,

On 3/11/23 08:32, Simon Tournier wrote:
> 
>> - Creating Guix manifests from `conda env export`
> 
> It is not clear for me what would be the path to tackle this.

Here are the prior mails about this:

https://lists.gnu.org/archive/html/guix-devel/2023-03/msg00020.html
https://lists.gnu.org/archive/html/guix-devel/2023-03/msg00023.html

Another idea could be to assist with experimentally showing cases where 
Conda is not reproducible, per this thread on Guix Science:

https://lists.gnu.org/archive/html/guix-science/2023-03/msg00005.html

It seems that there's some work being planned, I wonder if some of it 
could be extracted into a GSoC project.

> Feel free to ask more questions in guix-devel or guix-science or
> gwl-devel mailing lists or IRC libera.chat #guix or #guix-hpc.

Thanks - not super familiar with IRC yet but I'll learn!



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Google Summer of Code 2023 Inquiry
  2023-03-11 13:32 ` Simon Tournier
  2023-03-14 10:10   ` Simon Tournier
  2023-03-22 17:41   ` Spencer Skylar Chan
@ 2023-03-22 18:19   ` Ricardo Wurmus
  2023-03-22 21:44     ` Spencer Skylar Chan
  2 siblings, 1 reply; 18+ messages in thread
From: Ricardo Wurmus @ 2023-03-22 18:19 UTC (permalink / raw)
  To: Simon Tournier; +Cc: Spencer Skylar Chan, guix-devel, Kyle Andrews


>> - Translating Snakemake to Guix Workflow Language (GWL)
>
> Ricardo, maybe you would have some suggestions. :-)

Oh, this looks interesting.  Could you please elaborate on the idea?

-- 
Ricardo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Google Summer of Code 2023 Inquiry
  2023-03-22 18:19   ` Ricardo Wurmus
@ 2023-03-22 21:44     ` Spencer Skylar Chan
  2023-03-23  7:58       ` Ricardo Wurmus
  2023-03-24 18:59       ` Kyle
  0 siblings, 2 replies; 18+ messages in thread
From: Spencer Skylar Chan @ 2023-03-22 21:44 UTC (permalink / raw)
  To: Ricardo Wurmus, Simon Tournier; +Cc: guix-devel, Kyle Andrews

Hi Ricardo,

On 3/22/23 14:19, Ricardo Wurmus wrote:
> 
>>> - Translating Snakemake to Guix Workflow Language (GWL)
>>
>> Ricardo, maybe you would have some suggestions. :-)
> 
> Oh, this looks interesting.  Could you please elaborate on the idea?
> 
My idea is to take as input a Snakemake workflow file and eventually 
output an equivalent GWL workflow file.

Currently, Snakemake workflows can be exported to CWL (Common Workflow 
Language):

https://snakemake.readthedocs.io/en/stable/executing/interoperability.html

One approach could be to add CWL import/export capabilities to GWL. Then 
Snakemake/GWL conversion would be a 2 step process, using CWL as an 
intermediate step:

1. Snakemake -> CWL
2. CWL -> GWL

However, CWL is not as expressive as Snakemake. There may be some 
details that are lost from Snakemake workflows.

So a 1-step Snakemake/GWL transpiler could be interesting, as both 
Snakemake/GWL use a domain-specific language inside a general purpose 
language (Python/Guile respectively). There may be a possibility to 
achieve more "accurate" translations between workflows.

Is this topic something that could fit into a summer project?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Google Summer of Code 2023 Inquiry
  2023-03-22 21:44     ` Spencer Skylar Chan
@ 2023-03-23  7:58       ` Ricardo Wurmus
  2023-03-30 23:27         ` Spencer Skylar Chan
  2023-03-24 18:59       ` Kyle
  1 sibling, 1 reply; 18+ messages in thread
From: Ricardo Wurmus @ 2023-03-23  7:58 UTC (permalink / raw)
  To: Spencer Skylar Chan; +Cc: Simon Tournier, guix-devel, Kyle Andrews

Hi,

Spencer Skylar Chan <schan12@terpmail.umd.edu> writes:

> One approach could be to add CWL import/export capabilities to
> GWL. Then Snakemake/GWL conversion would be a 2 step process, using
> CWL as an intermediate step:
>
> 1. Snakemake -> CWL
> 2. CWL -> GWL

This seems doable.

> However, CWL is not as expressive as Snakemake. There may be some
> details that are lost from Snakemake workflows.
>
> So a 1-step Snakemake/GWL transpiler could be interesting, as both
> Snakemake/GWL use a domain-specific language inside a general purpose
> language (Python/Guile respectively). There may be a possibility to
> achieve more "accurate" translations between workflows.

Compared to the previous approach this seems vastly more complex.  It’s
one thing to *execute* Snakemake code without running it through Python,
but quite a bit more challenging to transpile Python to Scheme.

Personally, I wouldn’t know where to start.  Do you have an idea
already?

-- 
Ricardo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Google Summer of Code 2023 Inquiry
  2023-03-22 21:44     ` Spencer Skylar Chan
  2023-03-23  7:58       ` Ricardo Wurmus
@ 2023-03-24 18:59       ` Kyle
  2023-03-30 23:22         ` Spencer Skylar Chan
  1 sibling, 1 reply; 18+ messages in thread
From: Kyle @ 2023-03-24 18:59 UTC (permalink / raw)
  To: Spencer Skylar Chan, Ricardo Wurmus, Simon Tournier; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 2240 bytes --]

Dear Spencer, 

I am a bit worried about your proposed project is too focused on replacing python with guile. I think the project would benefit more from making python users more comfortable productively using Guix tools in concert with the tools they are already comfortable with.

I'm wondering if you might consider modifying your project goals toward exploring how GWL might be enhanced so that it could better complement more expressive language specific workflow tools like snakemake. I am also personally interested in exploring such a facilities from the targets workflow system in R as well. Alternatively, perhaps you could focus kn extending the GWL with more features?

I agree that establishing an achievable scope within a short timeline is crucial. The conda env importer idea would be quite an ambitious undertaking by itself and would lead you towards thinking about some pretty interesting and impactful problems.


On March 22, 2023 5:44:52 PM EDT, Spencer Skylar Chan <schan12@terpmail.umd.edu> wrote:
>Hi Ricardo,
>
>On 3/22/23 14:19, Ricardo Wurmus wrote:
>> 
>>>> - Translating Snakemake to Guix Workflow Language (GWL)
>>> 
>>> Ricardo, maybe you would have some suggestions. :-)
>> 
>> Oh, this looks interesting.  Could you please elaborate on the idea?
>> 
>My idea is to take as input a Snakemake workflow file and eventually output an equivalent GWL workflow file.
>
>Currently, Snakemake workflows can be exported to CWL (Common Workflow Language):
>
>https://snakemake.readthedocs.io/en/stable/executing/interoperability.html
>
>One approach could be to add CWL import/export capabilities to GWL. Then Snakemake/GWL conversion would be a 2 step process, using CWL as an intermediate step:
>
>1. Snakemake -> CWL
>2. CWL -> GWL
>
>However, CWL is not as expressive as Snakemake. There may be some details that are lost from Snakemake workflows.
>
>So a 1-step Snakemake/GWL transpiler could be interesting, as both Snakemake/GWL use a domain-specific language inside a general purpose language (Python/Guile respectively). There may be a possibility to achieve more "accurate" translations between workflows.
>
>Is this topic something that could fit into a summer project?

[-- Attachment #2: Type: text/html, Size: 3009 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Google Summer of Code 2023 Inquiry
  2023-03-24 18:59       ` Kyle
@ 2023-03-30 23:22         ` Spencer Skylar Chan
  2023-03-31 15:15           ` Kyle
  0 siblings, 1 reply; 18+ messages in thread
From: Spencer Skylar Chan @ 2023-03-30 23:22 UTC (permalink / raw)
  To: Kyle, Ricardo Wurmus, Simon Tournier; +Cc: guix-devel

Hi Kyle,

On 3/24/23 14:59, Kyle wrote:
> I am a bit worried about your proposed project is too focused on 
> replacing python with guile. I think the project would benefit more from 
> making python users more comfortable productively using Guix tools in 
> concert with the tools they are already comfortable with.

Yes, I agree with you. Replacing Python with Guile is a much more 
ambitious task and is not the highest priority here.

> I'm wondering if you might consider modifying your project goals toward 
> exploring how GWL might be enhanced so that it could better complement 
> more expressive language specific workflow tools like snakemake. I am 
> also personally interested in exploring such a facilities from the 
> targets workflow system in R as well. Alternatively, perhaps you could 
> focus kn extending the GWL with more features?

I would also be interested in extending GWL with more features, I will 
follow up with this on the GWL mailing list.

> I agree that establishing an achievable scope within a short timeline is 
> crucial. The conda env importer idea would be quite an ambitious 
> undertaking by itself and would lead you towards thinking about some 
> pretty interesting and impactful problems.

While it's a challenging project, it could be broken into smaller steps:

1. import packages by exact matching names only, without versioning.
2. extend `guix import` to have `guix import conda` to help with package 
names that do not match exactly, and to accelerate adoption of Conda 
packages not in Guix
3. match software version numbers when translating Conda packages to Guix

What's currently undefined is the error handling:
- if a Conda package does not exist in Guix
- if the dependency graph is not solvable
- if compiling the environment fails (due to mismatching dependency 
versions)

I believe there are many satisfactory stopping points for successful 
completion within the timeline of the summer, which I hope to present 
with my proposal soon.

Thanks,
Skylar

> 
> On March 22, 2023 5:44:52 PM EDT, Spencer Skylar Chan 
> <schan12@terpmail.umd.edu> wrote:
> 
>     Hi Ricardo,
> 
>     On 3/22/23 14:19, Ricardo Wurmus wrote:
> 
> 
>                 - Translating Snakemake to Guix Workflow Language (GWL)
> 
> 
>             Ricardo, maybe you would have some suggestions. :-)
> 
> 
>         Oh, this looks interesting. Could you please elaborate on the idea?
> 
>     My idea is to take as input a Snakemake workflow file and eventually output an equivalent GWL workflow file.
> 
>     Currently, Snakemake workflows can be exported to CWL (Common Workflow Language):
> 
>     https://snakemake.readthedocs.io/en/stable/executing/interoperability.html  <https://snakemake.readthedocs.io/en/stable/executing/interoperability.html>
> 
>     One approach could be to add CWL import/export capabilities to GWL. Then Snakemake/GWL conversion would be a 2 step process, using CWL as an intermediate step:
> 
>     1. Snakemake -> CWL
>     2. CWL -> GWL
> 
>     However, CWL is not as expressive as Snakemake. There may be some details that are lost from Snakemake workflows.
> 
>     So a 1-step Snakemake/GWL transpiler could be interesting, as both Snakemake/GWL use a domain-specific language inside a general purpose language (Python/Guile respectively). There may be a possibility to achieve more "accurate" translations between workflows.
> 
>     Is this topic something that could fit into a summer project?
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Google Summer of Code 2023 Inquiry
  2023-03-23  7:58       ` Ricardo Wurmus
@ 2023-03-30 23:27         ` Spencer Skylar Chan
  2023-03-31  0:52           ` Kyle
  0 siblings, 1 reply; 18+ messages in thread
From: Spencer Skylar Chan @ 2023-03-30 23:27 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: Simon Tournier, guix-devel, Kyle Andrews

Hi Ricardo,

On 3/23/23 03:58, Ricardo Wurmus wrote:
> Hi,
> 
> Spencer Skylar Chan <schan12@terpmail.umd.edu> writes:
> 
>> One approach could be to add CWL import/export capabilities to
>> GWL. Then Snakemake/GWL conversion would be a 2 step process, using
>> CWL as an intermediate step:
>>
>> 1. Snakemake -> CWL
>> 2. CWL -> GWL
> 
> This seems doable.

Great! I've been reading the chapter in Evolutionary Genomics on 
different scalable workflows to understand this process better.

>> However, CWL is not as expressive as Snakemake. There may be some
>> details that are lost from Snakemake workflows.
>>
>> So a 1-step Snakemake/GWL transpiler could be interesting, as both
>> Snakemake/GWL use a domain-specific language inside a general purpose
>> language (Python/Guile respectively). There may be a possibility to
>> achieve more "accurate" translations between workflows.
> 
> Compared to the previous approach this seems vastly more complex.  It’s
> one thing to *execute* Snakemake code without running it through Python,
> but quite a bit more challenging to transpile Python to Scheme.
> 
> Personally, I wouldn’t know where to start.  Do you have an idea
> already?
> 

Actually I was hoping you might have some ideas :)
I do think that if the execution of the pipeline is more important than 
its representation (Snakemake or otherwise), then it would make more 
sense to focus efforts on increasing GWL's capabilities.

Thanks,
Skylar


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Google Summer of Code 2023 Inquiry
  2023-03-30 23:27         ` Spencer Skylar Chan
@ 2023-03-31  0:52           ` Kyle
  0 siblings, 0 replies; 18+ messages in thread
From: Kyle @ 2023-03-31  0:52 UTC (permalink / raw)
  To: Spencer Skylar Chan, Ricardo Wurmus; +Cc: Simon Tournier, guix-devel

[-- Attachment #1: Type: text/plain, Size: 2981 bytes --]

As a statistician who always wants to get the most information for the least effort, I am particularly interested in being able to reprioritize workflow jobs interactively within the equivalent portions of the topological sort. I thought perhaps this would be possible with GWL if it could talk to SLURM with DRMAA version 2 (https://en.wikipedia.org/wiki/DRMAA). This would also be more readily useful to researchers if Guix had a conveniently available slurm service which worked out of the box even on a single machine. 

Stepping back, there might be a more ambitious question hidden in there in terms of how to handle indeterminism in a deterministic workflow manager. Without that external information the problem just involves choosing your random seeds up front. However,  I would prefer to write a procedure which is constantly reprioritizing labeled sub jobs within their associated containers either until I hit a resource limit or I have achieved certain target statistical diagnostics. Perhaps I would want GWL to tell me how to replay my build after the fact so I can make that reproducible even though I didn't know what I needed to focus my computations on up front and let the computer do that. Making that sort of thing possible might be a longer term effort, but working out what's needed for initial steps might be a fun project.

On March 30, 2023 7:27:37 PM EDT, Spencer Skylar Chan <schan12@terpmail.umd.edu> wrote:
>Hi Ricardo,
>
>On 3/23/23 03:58, Ricardo Wurmus wrote:
>> Hi,
>> 
>> Spencer Skylar Chan <schan12@terpmail.umd.edu> writes:
>> 
>>> One approach could be to add CWL import/export capabilities to
>>> GWL. Then Snakemake/GWL conversion would be a 2 step process, using
>>> CWL as an intermediate step:
>>> 
>>> 1. Snakemake -> CWL
>>> 2. CWL -> GWL
>> 
>> This seems doable.
>
>Great! I've been reading the chapter in Evolutionary Genomics on different scalable workflows to understand this process better.
>
>>> However, CWL is not as expressive as Snakemake. There may be some
>>> details that are lost from Snakemake workflows.
>>> 
>>> So a 1-step Snakemake/GWL transpiler could be interesting, as both
>>> Snakemake/GWL use a domain-specific language inside a general purpose
>>> language (Python/Guile respectively). There may be a possibility to
>>> achieve more "accurate" translations between workflows.
>> 
>> Compared to the previous approach this seems vastly more complex.  It’s
>> one thing to *execute* Snakemake code without running it through Python,
>> but quite a bit more challenging to transpile Python to Scheme.
>> 
>> Personally, I wouldn’t know where to start.  Do you have an idea
>> already?
>> 
>
>Actually I was hoping you might have some ideas :)
>I do think that if the execution of the pipeline is more important than its representation (Snakemake or otherwise), then it would make more sense to focus efforts on increasing GWL's capabilities.
>
>Thanks,
>Skylar

[-- Attachment #2: Type: text/html, Size: 3803 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Google Summer of Code 2023 Inquiry
  2023-03-30 23:22         ` Spencer Skylar Chan
@ 2023-03-31 15:15           ` Kyle
  2023-04-04  0:41             ` Spencer Skylar Chan
  0 siblings, 1 reply; 18+ messages in thread
From: Kyle @ 2023-03-31 15:15 UTC (permalink / raw)
  To: Spencer Skylar Chan, Ricardo Wurmus, Simon Tournier; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 5070 bytes --]

I would expect most software versions to not be in Guix. Simon had mentioned that this is mostly what the guix-past repository is for. However, some packages might be buried on some branch or some commit in some Guix related git repository. It may be helpful to facilitate their discovery and extraction for conda import.

Git has a newish binary file format for caching searches across commits. Maybe it would be helpful to figure out how to parse this format (its documented) and index the data further using Xapian or a graph data structure (or tree sitter?) with the relevant metadata needed to find and efficiently extract scheme code and its dependencies?

You make an interesting point about compilation errors. It may more productive to help researchers test for working satisfiable configurations as a more relaxed approach to having to specify the exact software version. Maybe some "nearby" or newer version is packaged and that is enough to successfully run a test suite? I'm imagining something between git bisect and Guix's own package solver. 

It might also be productive to add infrastructure to help scientists more conveniently track and study their recent packaging experiments. Guix will only become more useful the more packages which are already available. Work which makes packaging more approachable by more people benefits everyone. Perhaps you can think of other ideas in this direction?

On March 30, 2023 7:22:14 PM EDT, Spencer Skylar Chan <schan12@terpmail.umd.edu> wrote:
>Hi Kyle,
>
>On 3/24/23 14:59, Kyle wrote:
>> I am a bit worried about your proposed project is too focused on replacing python with guile. I think the project would benefit more from making python users more comfortable productively using Guix tools in concert with the tools they are already comfortable with.
>
>Yes, I agree with you. Replacing Python with Guile is a much more ambitious task and is not the highest priority here.
>
>> I'm wondering if you might consider modifying your project goals toward exploring how GWL might be enhanced so that it could better complement more expressive language specific workflow tools like snakemake. I am also personally interested in exploring such a facilities from the targets workflow system in R as well. Alternatively, perhaps you could focus kn extending the GWL with more features?
>
>I would also be interested in extending GWL with more features, I will follow up with this on the GWL mailing list.
>
>> I agree that establishing an achievable scope within a short timeline is crucial. The conda env importer idea would be quite an ambitious undertaking by itself and would lead you towards thinking about some pretty interesting and impactful problems.
>
>While it's a challenging project, it could be broken into smaller steps:
>
>1. import packages by exact matching names only, without versioning.
>2. extend `guix import` to have `guix import conda` to help with package names that do not match exactly, and to accelerate adoption of Conda packages not in Guix
>3. match software version numbers when translating Conda packages to Guix
>
>What's currently undefined is the error handling:
>- if a Conda package does not exist in Guix
>- if the dependency graph is not solvable
>- if compiling the environment fails (due to mismatching dependency versions)
>
>I believe there are many satisfactory stopping points for successful completion within the timeline of the summer, which I hope to present with my proposal soon.
>
>Thanks,
>Skylar
>
>> 
>> On March 22, 2023 5:44:52 PM EDT, Spencer Skylar Chan <schan12@terpmail.umd.edu> wrote:
>> 
>>     Hi Ricardo,
>> 
>>     On 3/22/23 14:19, Ricardo Wurmus wrote:
>> 
>> 
>>                 - Translating Snakemake to Guix Workflow Language (GWL)
>> 
>> 
>>             Ricardo, maybe you would have some suggestions. :-)
>> 
>> 
>>         Oh, this looks interesting. Could you please elaborate on the idea?
>> 
>>     My idea is to take as input a Snakemake workflow file and eventually output an equivalent GWL workflow file.
>> 
>>     Currently, Snakemake workflows can be exported to CWL (Common Workflow Language):
>> 
>>     https://snakemake.readthedocs.io/en/stable/executing/interoperability.html  <https://snakemake.readthedocs.io/en/stable/executing/interoperability.html>
>> 
>>     One approach could be to add CWL import/export capabilities to GWL. Then Snakemake/GWL conversion would be a 2 step process, using CWL as an intermediate step:
>> 
>>     1. Snakemake -> CWL
>>     2. CWL -> GWL
>> 
>>     However, CWL is not as expressive as Snakemake. There may be some details that are lost from Snakemake workflows.
>> 
>>     So a 1-step Snakemake/GWL transpiler could be interesting, as both Snakemake/GWL use a domain-specific language inside a general purpose language (Python/Guile respectively). There may be a possibility to achieve more "accurate" translations between workflows.
>> 
>>     Is this topic something that could fit into a summer project?
>> 
>

[-- Attachment #2: Type: text/html, Size: 6052 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Google Summer of Code 2023 Inquiry
  2023-03-31 15:15           ` Kyle
@ 2023-04-04  0:41             ` Spencer Skylar Chan
  2023-04-04  6:29               ` Kyle
  2023-04-04  8:59               ` Simon Tournier
  0 siblings, 2 replies; 18+ messages in thread
From: Spencer Skylar Chan @ 2023-04-04  0:41 UTC (permalink / raw)
  To: Kyle, Ricardo Wurmus, Simon Tournier; +Cc: guix-devel

Hi Kyle,

On 3/31/23 11:15, Kyle wrote:
> I would expect most software versions to not be in Guix. Simon had mentioned that this is mostly what the guix-past repository is for. However, some packages might be buried on some branch or some commit in some Guix related git repository. It may be helpful to facilitate their discovery and extraction for conda import.
> 
> Git has a newish binary file format for caching searches across commits. Maybe it would be helpful to figure out how to parse this format (its documented) and index the data further using Xapian or a graph data structure (or tree sitter?) with the relevant metadata needed to find and efficiently extract scheme code and its dependencies?

If the format is documented then this is possible, although I'm not 
super familiar with these kinds of data structures.

> You make an interesting point about compilation errors. It may more productive to help researchers test for working satisfiable configurations as a more relaxed approach to having to specify the exact software version. Maybe some "nearby" or newer version is packaged and that is enough to successfully run a test suite? I'm imagining something between git bisect and Guix's own package solver.

Yes, we could have a variant of the solver that's more relaxed. It could 
output multiple solutions so the user can inspect them and pick the best 
one.

> It might also be productive to add infrastructure to help scientists more conveniently track and study their recent packaging experiments. Guix will only become more useful the more packages which are already available. Work which makes packaging more approachable by more people benefits everyone. Perhaps you can think of other ideas in this direction?

I'm not sure how "packaging experiments" are different from packaging 
software the usual way. I think making the importers easier to use and 
debug would help, although that sounds outside the scope of the projects.

Finally, would these projects be considered large or medium for the 
purposes of GSOC?

Thanks,
Skylar

> On March 30, 2023 7:22:14 PM EDT, Spencer Skylar Chan <schan12@terpmail.umd.edu> wrote:
>> Hi Kyle,
>>
>> On 3/24/23 14:59, Kyle wrote:
>>> I am a bit worried about your proposed project is too focused on replacing python with guile. I think the project would benefit more from making python users more comfortable productively using Guix tools in concert with the tools they are already comfortable with.
>>
>> Yes, I agree with you. Replacing Python with Guile is a much more ambitious task and is not the highest priority here.
>>
>>> I'm wondering if you might consider modifying your project goals toward exploring how GWL might be enhanced so that it could better complement more expressive language specific workflow tools like snakemake. I am also personally interested in exploring such a facilities from the targets workflow system in R as well. Alternatively, perhaps you could focus kn extending the GWL with more features?
>>
>> I would also be interested in extending GWL with more features, I will follow up with this on the GWL mailing list.
>>
>>> I agree that establishing an achievable scope within a short timeline is crucial. The conda env importer idea would be quite an ambitious undertaking by itself and would lead you towards thinking about some pretty interesting and impactful problems.
>>
>> While it's a challenging project, it could be broken into smaller steps:
>>
>> 1. import packages by exact matching names only, without versioning.
>> 2. extend `guix import` to have `guix import conda` to help with package names that do not match exactly, and to accelerate adoption of Conda packages not in Guix
>> 3. match software version numbers when translating Conda packages to Guix
>>
>> What's currently undefined is the error handling:
>> - if a Conda package does not exist in Guix
>> - if the dependency graph is not solvable
>> - if compiling the environment fails (due to mismatching dependency versions)
>>
>> I believe there are many satisfactory stopping points for successful completion within the timeline of the summer, which I hope to present with my proposal soon.
>>
>> Thanks,
>> Skylar
>>
>>>
>>> On March 22, 2023 5:44:52 PM EDT, Spencer Skylar Chan <schan12@terpmail.umd.edu> wrote:
>>>
>>>      Hi Ricardo,
>>>
>>>      On 3/22/23 14:19, Ricardo Wurmus wrote:
>>>
>>>
>>>                  - Translating Snakemake to Guix Workflow Language (GWL)
>>>
>>>
>>>              Ricardo, maybe you would have some suggestions. :-)
>>>
>>>
>>>          Oh, this looks interesting. Could you please elaborate on the idea?
>>>
>>>      My idea is to take as input a Snakemake workflow file and eventually output an equivalent GWL workflow file.
>>>
>>>      Currently, Snakemake workflows can be exported to CWL (Common Workflow Language):
>>>
>>>      https://snakemake.readthedocs.io/en/stable/executing/interoperability.html  <https://snakemake.readthedocs.io/en/stable/executing/interoperability.html>
>>>
>>>      One approach could be to add CWL import/export capabilities to GWL. Then Snakemake/GWL conversion would be a 2 step process, using CWL as an intermediate step:
>>>
>>>      1. Snakemake -> CWL
>>>      2. CWL -> GWL
>>>
>>>      However, CWL is not as expressive as Snakemake. There may be some details that are lost from Snakemake workflows.
>>>
>>>      So a 1-step Snakemake/GWL transpiler could be interesting, as both Snakemake/GWL use a domain-specific language inside a general purpose language (Python/Guile respectively). There may be a possibility to achieve more "accurate" translations between workflows.
>>>
>>>      Is this topic something that could fit into a summer project?
>>>
>>
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Google Summer of Code 2023 Inquiry
  2023-04-04  0:41             ` Spencer Skylar Chan
@ 2023-04-04  6:29               ` Kyle
  2023-04-04  8:59               ` Simon Tournier
  1 sibling, 0 replies; 18+ messages in thread
From: Kyle @ 2023-04-04  6:29 UTC (permalink / raw)
  To: Spencer Skylar Chan, Ricardo Wurmus, Simon Tournier; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 6832 bytes --]

Hi Spencer,

Here is the documentation for the git commit-graph cache file. The authors also made their own blog posts about it as well with a bit more explanation.

=> https://git-scm.com/docs/commit-graph
=> https://devblogs.microsoft.com/devops/updates-to-the-git-commit-graph-feature/

Maybe it won't turn out to be needed... just thought it might help get you thinking. Please read all my suggestions from that perspective as a reasonable default.

I will have to defer to others for gauging the size of projects. I have found as a rule there are always many more details to be considered than I could have anticipated at the start of a project. That said I liked your earlier stated plan of starting simple. Handling latest releases seems a reasonable minimal viable product.

Cheers,
Kyle





On April 3, 2023 8:41:53 PM EDT, Spencer Skylar Chan <schan12@terpmail.umd.edu> wrote:
>Hi Kyle,
>
>On 3/31/23 11:15, Kyle wrote:
>> I would expect most software versions to not be in Guix. Simon had mentioned that this is mostly what the guix-past repository is for. However, some packages might be buried on some branch or some commit in some Guix related git repository. It may be helpful to facilitate their discovery and extraction for conda import.
>> 
>> Git has a newish binary file format for caching searches across commits. Maybe it would be helpful to figure out how to parse this format (its documented) and index the data further using Xapian or a graph data structure (or tree sitter?) with the relevant metadata needed to find and efficiently extract scheme code and its dependencies?
>
>If the format is documented then this is possible, although I'm not super familiar with these kinds of data structures.
>
>> You make an interesting point about compilation errors. It may more productive to help researchers test for working satisfiable configurations as a more relaxed approach to having to specify the exact software version. Maybe some "nearby" or newer version is packaged and that is enough to successfully run a test suite? I'm imagining something between git bisect and Guix's own package solver.
>
>Yes, we could have a variant of the solver that's more relaxed. It could output multiple solutions so the user can inspect them and pick the best one.
>
>> It might also be productive to add infrastructure to help scientists more conveniently track and study their recent packaging experiments. Guix will only become more useful the more packages which are already available. Work which makes packaging more approachable by more people benefits everyone. Perhaps you can think of other ideas in this direction?
>
>I'm not sure how "packaging experiments" are different from packaging software the usual way. I think making the importers easier to use and debug would help, although that sounds outside the scope of the projects.
>
>Finally, would these projects be considered large or medium for the purposes of GSOC?
>
>Thanks,
>Skylar
>
>> On March 30, 2023 7:22:14 PM EDT, Spencer Skylar Chan <schan12@terpmail.umd.edu> wrote:
>>> Hi Kyle,
>>> 
>>> On 3/24/23 14:59, Kyle wrote:
>>>> I am a bit worried about your proposed project is too focused on replacing python with guile. I think the project would benefit more from making python users more comfortable productively using Guix tools in concert with the tools they are already comfortable with.
>>> 
>>> Yes, I agree with you. Replacing Python with Guile is a much more ambitious task and is not the highest priority here.
>>> 
>>>> I'm wondering if you might consider modifying your project goals toward exploring how GWL might be enhanced so that it could better complement more expressive language specific workflow tools like snakemake. I am also personally interested in exploring such a facilities from the targets workflow system in R as well. Alternatively, perhaps you could focus kn extending the GWL with more features?
>>> 
>>> I would also be interested in extending GWL with more features, I will follow up with this on the GWL mailing list.
>>> 
>>>> I agree that establishing an achievable scope within a short timeline is crucial. The conda env importer idea would be quite an ambitious undertaking by itself and would lead you towards thinking about some pretty interesting and impactful problems.
>>> 
>>> While it's a challenging project, it could be broken into smaller steps:
>>> 
>>> 1. import packages by exact matching names only, without versioning.
>>> 2. extend `guix import` to have `guix import conda` to help with package names that do not match exactly, and to accelerate adoption of Conda packages not in Guix
>>> 3. match software version numbers when translating Conda packages to Guix
>>> 
>>> What's currently undefined is the error handling:
>>> - if a Conda package does not exist in Guix
>>> - if the dependency graph is not solvable
>>> - if compiling the environment fails (due to mismatching dependency versions)
>>> 
>>> I believe there are many satisfactory stopping points for successful completion within the timeline of the summer, which I hope to present with my proposal soon.
>>> 
>>> Thanks,
>>> Skylar
>>> 
>>>> 
>>>> On March 22, 2023 5:44:52 PM EDT, Spencer Skylar Chan <schan12@terpmail.umd.edu> wrote:
>>>> 
>>>>      Hi Ricardo,
>>>> 
>>>>      On 3/22/23 14:19, Ricardo Wurmus wrote:
>>>> 
>>>> 
>>>>                  - Translating Snakemake to Guix Workflow Language (GWL)
>>>> 
>>>> 
>>>>              Ricardo, maybe you would have some suggestions. :-)
>>>> 
>>>> 
>>>>          Oh, this looks interesting. Could you please elaborate on the idea?
>>>> 
>>>>      My idea is to take as input a Snakemake workflow file and eventually output an equivalent GWL workflow file.
>>>> 
>>>>      Currently, Snakemake workflows can be exported to CWL (Common Workflow Language):
>>>> 
>>>>      https://snakemake.readthedocs.io/en/stable/executing/interoperability.html  <https://snakemake.readthedocs.io/en/stable/executing/interoperability.html>
>>>> 
>>>>      One approach could be to add CWL import/export capabilities to GWL. Then Snakemake/GWL conversion would be a 2 step process, using CWL as an intermediate step:
>>>> 
>>>>      1. Snakemake -> CWL
>>>>      2. CWL -> GWL
>>>> 
>>>>      However, CWL is not as expressive as Snakemake. There may be some details that are lost from Snakemake workflows.
>>>> 
>>>>      So a 1-step Snakemake/GWL transpiler could be interesting, as both Snakemake/GWL use a domain-specific language inside a general purpose language (Python/Guile respectively). There may be a possibility to achieve more "accurate" translations between workflows.
>>>> 
>>>>      Is this topic something that could fit into a summer project?
>>>> 
>>> 
>> 
>

[-- Attachment #2: Type: text/html, Size: 8504 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Google Summer of Code 2023 Inquiry
  2023-04-04  0:41             ` Spencer Skylar Chan
  2023-04-04  6:29               ` Kyle
@ 2023-04-04  8:59               ` Simon Tournier
  2023-04-04 14:32                 ` Kyle
  1 sibling, 1 reply; 18+ messages in thread
From: Simon Tournier @ 2023-04-04  8:59 UTC (permalink / raw)
  To: Spencer Skylar Chan, Kyle, Ricardo Wurmus; +Cc: guix-devel

Hi,

On Mon, 03 Apr 2023 at 20:41, Spencer Skylar Chan <schan12@terpmail.umd.edu> wrote:

>> I would expect most software versions to not be in Guix. Simon had
>> mentioned that this is mostly what the guix-past repository is
>> for. However, some packages might be buried on some branch or some
>> commit in some Guix related git repository. It may be helpful to
>> facilitate their discovery and extraction for conda import. 

Please note,

 1. The aim of the guix-past [1] channel is to have previous versions of
    some packages still working with recent Guix revisions.  The
    motivation of guix-past had been the 10 Years Challenge [2] and then
    fed by hackathon [3].

 2. There is no easy way to know which revision of Guix provides that
    specific version of this package.  The discovery of package version
    mapping Guix revision is not straightforward with the current tool.
    I am aware of two directions: rely on external server as the Guix
    Data Service [4] or implement “guix git log” [5] (the code lives in
    the branch ’wip-guix-log’).

1: https://gitlab.inria.fr/guix-hpc/guix-past
2: http://rescience.github.io/ten-years/
3: https://hpc.guix.info/blog/2020/07/reproducible-research-hackathon-experience-report/
4: https://data.guix.gnu.org/repository/1/branch/master/package/gmsh/output-history
5: https://guix.gnu.org/en/blog/2021/outreachy-guix-git-log-internship-wrap-up/

>> Git has a newish binary file format for caching searches across
>> commits. Maybe it would be helpful to figure out how to parse this
>> format (its documented) and index the data further using Xapian or a
>> graph data structure (or tree sitter?) with the relevant metadata
>> needed to find and efficiently extract scheme code and its
>> dependencies? 

Months ago, I have started to do that: index the package list using
Xapian.  Well, started is a strong word here, since I have not done
much.  My idea was (is still!) an attempt to address to two in the same
time: faster “guix search” [6] and discovery the past versions.

Somehow rework Arun’s patches [6].  From my point of view, it would be
possible to add Xapian as a dependency for Guix, therefore I think it
should use GUIX_EXTENSIONS_PATH.

6: https://issues.guix.gnu.org/39258#14


> If the format is documented then this is possible, although I'm not 
> super familiar with these kinds of data structures.

As said, an entry point about how “guix search” works is the super long
discussion in #39258 [7]. :-)

7: https://issues.guix.gnu.org/39258


>> You make an interesting point about compilation errors. It may more
>> productive to help researchers test for working satisfiable
>> configurations as a more relaxed approach to having to specify the
>> exact software version. Maybe some "nearby" or newer version is
>> packaged and that is enough to successfully run a test suite? I'm
>> imagining something between git bisect and Guix's own package
>> solver. 
>
> Yes, we could have a variant of the solver that's more relaxed. It could 
> output multiple solutions so the user can inspect them and pick the best 
> one.

I do not know what you have in mind with “working satisfiable
configurations” or with “a variant of the solver”.  To my knowledge,
this implies some SAT solver.  Well, before going this direction, I
would suggest to read some output of the Mancoosi project [8].
Especially this part [9].  From my point of view, the direction “working
satisfiable configurations” or “a variant of the solver” would break the
reproducibility of a specific configuration for the general case.  Part
of the problem about computational environment reproducibility is
because package manager implements solvers for installing some packages.

That’s said, all the package versions that Guix can provide is some DAG
because it is a Git history – well, it is the combination of several Git
histories when considering several channels.  Thus, a specific version
for a package is given by an interval in the graph.  Considering a list
of packages at one specific version, we end with a list of intervals.
The “working satisfiable configuration” is then the intersection of all
the intervals of this list; note that the resulting output could also be
the empty interval.

It’s a problem of graph.  Almost trivial when the graph is linear.  But
it requires some work when merge happens.  And note that the merges
merge some branches that does not always fully build; for instance part
of core-updates before its merges.  To my knowledge, it is impossible to
detect beforehand.

We discussed these kind of topics when introducing “guix package
--export-channels”; it is a variant of this proposal, IMHO.

Last, considering all Guix the version fields, I am not convinced it is
straightforward to guarantee some “nearby” or newer versions.  It can
only be heuristics working with more or less accuracy; see “guix
refresh” and all the updaters.

All in all, I am not convinced Guix should try to implement a way to
“specify the exact software version”.  Because it leads to false
considerations that label versions are enough for reproducing
computational environments, when it is far to be.

Well, I agree that Guix should only provide tools to build channels.scm
and manifest.scm files, both hinted by some inputs as requirements.txt.

And strongly claiming that only the resulting computational environment
generated by channels.scm+manifest.scm is reproducible.  All other
computational environments generated with inputs other than
channels.scm+manifest.scm is not reproducible – this includes any
converter from whatever inputs to generated channels.scm+manifest.scm.


8: https://www.mancoosi.org/
9: https://www.mancoosi.org/edos/algorithmic/


> Finally, would these projects be considered large or medium for the 
> purposes of GSOC?

Well, there is many ideas floating around. :-)  That’s because many work
still remain. ;-)

Many ideas discussed here are larger than GSoC.  Now, you should pick
one that interests you and where you have an idea for implementing it.

Then try to draw a schedule to see if you think it would fit.  Please
consider that implementing always takes longer than initially planned –
there is always unexpected tiny details that are blocking the initial
plan; devil, details and all that. ;-)


Cheers,
simon


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Google Summer of Code 2023 Inquiry
  2023-04-04  8:59               ` Simon Tournier
@ 2023-04-04 14:32                 ` Kyle
  2023-04-04 17:15                   ` Simon Tournier
  0 siblings, 1 reply; 18+ messages in thread
From: Kyle @ 2023-04-04 14:32 UTC (permalink / raw)
  To: Simon Tournier, Spencer Skylar Chan, Ricardo Wurmus; +Cc: guix-devel




>I do not know what you have in mind with “working satisfiable
>configurations” or with “a variant of the solver”.  To my knowledge,
>this implies some SAT solver.  Well, before going this direction, I
>would suggest to read some output of the Mancoosi project [8].
>Especially this part [9].  From my point of view, the direction “working
>satisfiable configurations” or “a variant of the solver” would break the
>reproducibility of a specific configuration for the general case.  Part
>of the problem about computational environment reproducibility is
>because package manager implements solvers for installing some packages.

Yeah, we definitely don't want a solver for instantiating a profile. We want that explicit already in the manifest.scm. However, my understanding is that the role of an importer is to create a manifest.scm or, more realistically, help a user get started creating it. There will probably usually need to be additional tweaking related to the intended application the computational environment supports. The CRAN importer, for example, cannot yet detect non-R dependencies. So, the profile author has to figure those out for themselves. It's still very useful despite not being perfect. 

>Last, considering all Guix the version fields, I am not convinced it is
>straightforward to guarantee some “nearby” or newer versions.  It can
>only be heuristics working with more or less accuracy; see “guix
>refresh” and all the updaters.

Sure, but as is shown with "guix import cran" as I previously mentioned, it doesn't have to be perfect to be really useful in many cases.

>All in all, I am not convinced Guix should try to implement a way to
>“specify the exact software version”.  Because it leads to false
>considerations that label versions are enough for reproducing
>computational environments, when it is far to be.

It definitely is not enough, but that is where its up to the profile author to flesh out many examples of what their software is supposed to do and verify those still work under Guix.

Having tools to benchmark against existing, but not long term reproducible, software environments would help in this import case because that is the goal with conda. Researchers should not expect to go from "good enough" for now to guaranteed reproducibility without also doing a lot of empirical testing. 

Researchers have to start somewhere and convenience often trumps other considerations at the beginning since most new projects fail. To get researchers to start from Guix, they need either an army of packagers willing to assist them with packaging or for there to be so much convenience in Guix to package new software such that it isn't much of a hassle for the researcher to do it. I hope for both, but feel like working towards the latter would bolster the chances of the former. You could imagine Xapian being used to suggest also additional package inputs just as "guix build -f" already suggests missing scheme modules.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Google Summer of Code 2023 Inquiry
  2023-04-04 14:32                 ` Kyle
@ 2023-04-04 17:15                   ` Simon Tournier
  0 siblings, 0 replies; 18+ messages in thread
From: Simon Tournier @ 2023-04-04 17:15 UTC (permalink / raw)
  To: Kyle, Spencer Skylar Chan, Ricardo Wurmus; +Cc: guix-devel

Hi Kyle,

On Tue, 04 Apr 2023 at 14:32, Kyle <kyle@posteo.net> wrote:

>           The CRAN importer, for example, cannot yet detect non-R
> dependencies. So, the profile author has to figure those out for
> themselves. It's still very useful despite not being perfect.  

Yeah, improving the importers is very helpful…

> Sure, but as is shown with "guix import cran" as I previously
> mentioned, it doesn't have to be perfect to be really useful in many
> cases.

…but please note the R ecosystem is probably one of the best around.

Well, I will not extrapolate to other ecosystem as Python or else based
on what Lars did with the channel guix-cran [1].

For more details, give a look to this thread [2],

        Accuracy of importers?
        Ludovic Courtès <ludovic.courtes@inria.fr>
        Thu, 28 Oct 2021 09:02:27 +0200

or slide 53 of
https://git.savannah.gnu.org/cgit/guix/maintenance.git/plain/talks/packaging-con-2021/grail/talk.20211110.pdf 
  

In addition, quoting another discussion from [3]:

        Well, it strongly depends on the quality of the targeted language
        ecosystem.  For some, they provide enough metadata to rely on for good
        automatizing; for instance, R with CRAN or Bioconductor.

        Sadly, for many others ecosystem, they (upstream) do not provide enough
        metadata to automatically fill all the package fields.  And some manual
        tweaks are required.

        For example, let count the number of packages that are tweaking their
        ’arguments’ fields (from ’#:tests? #f’ to complex phases modifications).
        This is far from being a perfect metrics but it is a rough indication
        about upstream quality: if they provide clean package respecting their
        build system or if the package requires Guix adjustments.

        Well, I get:

              r            : 2093 = 2093 = 1991 + 102 

        which is good (only ~5% require ’arguments’ tweaks), but

              python       : 2630 = 2630 = 803  + 1827

        is bad (only ~31% do not require an ’arguments’ tweak).

and the analysis can be refined, for instance which keyword ’arguments’
are they tweaked?  I did it [4] for the emacs-build-system:

                emacs        : 1234 = 1234 = 878  + 356
                    ("phases" . 213)
                    ("tests?" . 144)
                    ("test-command" . 127)
                    ("include" . 87)
                    ("emacs" . 25)
                    ("exclude" . 20)
                    ("modules" . 7)
                    ("imported-modules" . 4)
                    ("parallel-tests?" . 1) 

        Considering this 356 packages, 144 modifies the keyword #:tests?.  Note
        that ’#:tests? #t’ is counted in these 144 and it reads,

            $ ag 'tests\? #t' gnu/packages/emacs-xyz.scm | wc -l
            117

        Ah!  It requires some investigations. :-)

Last, in addition to ideas of improvements provided by the thread [3,4],
the conclusion is still:

        Indeed, it could be worth to identify common sources of the extra
        modifications we are doing compared to the default emacs-build-system.

Yeah, improving the importers is very helpful! :-)

Well, considering that 95% of the current R packages in Guix just work
out-of-the-box from the CRAN metadata, and considering how many packages
guix-cran provides compared to how many packages CRAN provides, we can
roughly extrapolate the meaning of “doesn't have to be perfect” for
other ecosystem as Python or else.  Roughly speaking, consider the 30%
of the current Python packages in Guix that are working out-of-the-box.

Yeah, these numbers are very partial and finer analysis could help in
improving the importers.  But these numbers show that the conclusion
drawn from the CRAN example would not apply as-is for others, IMHO.


1: https://hpc.guix.info/blog/2022/12/cran-a-practical-example-for-being-reproducible-at-large-scale-using-gnu-guix/
2: https://yhetil.org/guix/878ryd8we4.fsf@inria.fr/#r
3: https://yhetil.org/guix/86cz9kk71y.fsf@gmail.com
4: https://yhetil.org/guix/87cz9gunwx.fsf@gmail.com


Cheers,
simon


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2023-04-04 17:20 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-07  1:31 Google Summer of Code 2023 Inquiry Spencer Skylar Chan
2023-03-11 13:32 ` Simon Tournier
2023-03-14 10:10   ` Simon Tournier
2023-03-22 17:41   ` Spencer Skylar Chan
2023-03-22 18:19   ` Ricardo Wurmus
2023-03-22 21:44     ` Spencer Skylar Chan
2023-03-23  7:58       ` Ricardo Wurmus
2023-03-30 23:27         ` Spencer Skylar Chan
2023-03-31  0:52           ` Kyle
2023-03-24 18:59       ` Kyle
2023-03-30 23:22         ` Spencer Skylar Chan
2023-03-31 15:15           ` Kyle
2023-04-04  0:41             ` Spencer Skylar Chan
2023-04-04  6:29               ` Kyle
2023-04-04  8:59               ` Simon Tournier
2023-04-04 14:32                 ` Kyle
2023-04-04 17:15                   ` Simon Tournier
  -- strict thread matches above, loose matches on Subject: below --
2023-03-08  2:33 Spencer Skylar Chan

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).