unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Simon Tournier <zimon.toutoune@gmail.com>
To: Spencer Skylar Chan <schan12@terpmail.umd.edu>,
	Kyle <kyle@posteo.net>, Ricardo Wurmus <rekado@elephly.net>
Cc: guix-devel@gnu.org
Subject: Re: Google Summer of Code 2023 Inquiry
Date: Tue, 04 Apr 2023 10:59:33 +0200	[thread overview]
Message-ID: <86ttxwvx8q.fsf@gmail.com> (raw)
In-Reply-To: <09755392-de37-c039-6b60-46310f6f4314@terpmail.umd.edu>

Hi,

On Mon, 03 Apr 2023 at 20:41, Spencer Skylar Chan <schan12@terpmail.umd.edu> wrote:

>> I would expect most software versions to not be in Guix. Simon had
>> mentioned that this is mostly what the guix-past repository is
>> for. However, some packages might be buried on some branch or some
>> commit in some Guix related git repository. It may be helpful to
>> facilitate their discovery and extraction for conda import. 

Please note,

 1. The aim of the guix-past [1] channel is to have previous versions of
    some packages still working with recent Guix revisions.  The
    motivation of guix-past had been the 10 Years Challenge [2] and then
    fed by hackathon [3].

 2. There is no easy way to know which revision of Guix provides that
    specific version of this package.  The discovery of package version
    mapping Guix revision is not straightforward with the current tool.
    I am aware of two directions: rely on external server as the Guix
    Data Service [4] or implement “guix git log” [5] (the code lives in
    the branch ’wip-guix-log’).

1: https://gitlab.inria.fr/guix-hpc/guix-past
2: http://rescience.github.io/ten-years/
3: https://hpc.guix.info/blog/2020/07/reproducible-research-hackathon-experience-report/
4: https://data.guix.gnu.org/repository/1/branch/master/package/gmsh/output-history
5: https://guix.gnu.org/en/blog/2021/outreachy-guix-git-log-internship-wrap-up/

>> Git has a newish binary file format for caching searches across
>> commits. Maybe it would be helpful to figure out how to parse this
>> format (its documented) and index the data further using Xapian or a
>> graph data structure (or tree sitter?) with the relevant metadata
>> needed to find and efficiently extract scheme code and its
>> dependencies? 

Months ago, I have started to do that: index the package list using
Xapian.  Well, started is a strong word here, since I have not done
much.  My idea was (is still!) an attempt to address to two in the same
time: faster “guix search” [6] and discovery the past versions.

Somehow rework Arun’s patches [6].  From my point of view, it would be
possible to add Xapian as a dependency for Guix, therefore I think it
should use GUIX_EXTENSIONS_PATH.

6: https://issues.guix.gnu.org/39258#14


> If the format is documented then this is possible, although I'm not 
> super familiar with these kinds of data structures.

As said, an entry point about how “guix search” works is the super long
discussion in #39258 [7]. :-)

7: https://issues.guix.gnu.org/39258


>> You make an interesting point about compilation errors. It may more
>> productive to help researchers test for working satisfiable
>> configurations as a more relaxed approach to having to specify the
>> exact software version. Maybe some "nearby" or newer version is
>> packaged and that is enough to successfully run a test suite? I'm
>> imagining something between git bisect and Guix's own package
>> solver. 
>
> Yes, we could have a variant of the solver that's more relaxed. It could 
> output multiple solutions so the user can inspect them and pick the best 
> one.

I do not know what you have in mind with “working satisfiable
configurations” or with “a variant of the solver”.  To my knowledge,
this implies some SAT solver.  Well, before going this direction, I
would suggest to read some output of the Mancoosi project [8].
Especially this part [9].  From my point of view, the direction “working
satisfiable configurations” or “a variant of the solver” would break the
reproducibility of a specific configuration for the general case.  Part
of the problem about computational environment reproducibility is
because package manager implements solvers for installing some packages.

That’s said, all the package versions that Guix can provide is some DAG
because it is a Git history – well, it is the combination of several Git
histories when considering several channels.  Thus, a specific version
for a package is given by an interval in the graph.  Considering a list
of packages at one specific version, we end with a list of intervals.
The “working satisfiable configuration” is then the intersection of all
the intervals of this list; note that the resulting output could also be
the empty interval.

It’s a problem of graph.  Almost trivial when the graph is linear.  But
it requires some work when merge happens.  And note that the merges
merge some branches that does not always fully build; for instance part
of core-updates before its merges.  To my knowledge, it is impossible to
detect beforehand.

We discussed these kind of topics when introducing “guix package
--export-channels”; it is a variant of this proposal, IMHO.

Last, considering all Guix the version fields, I am not convinced it is
straightforward to guarantee some “nearby” or newer versions.  It can
only be heuristics working with more or less accuracy; see “guix
refresh” and all the updaters.

All in all, I am not convinced Guix should try to implement a way to
“specify the exact software version”.  Because it leads to false
considerations that label versions are enough for reproducing
computational environments, when it is far to be.

Well, I agree that Guix should only provide tools to build channels.scm
and manifest.scm files, both hinted by some inputs as requirements.txt.

And strongly claiming that only the resulting computational environment
generated by channels.scm+manifest.scm is reproducible.  All other
computational environments generated with inputs other than
channels.scm+manifest.scm is not reproducible – this includes any
converter from whatever inputs to generated channels.scm+manifest.scm.


8: https://www.mancoosi.org/
9: https://www.mancoosi.org/edos/algorithmic/


> Finally, would these projects be considered large or medium for the 
> purposes of GSOC?

Well, there is many ideas floating around. :-)  That’s because many work
still remain. ;-)

Many ideas discussed here are larger than GSoC.  Now, you should pick
one that interests you and where you have an idea for implementing it.

Then try to draw a schedule to see if you think it would fit.  Please
consider that implementing always takes longer than initially planned –
there is always unexpected tiny details that are blocking the initial
plan; devil, details and all that. ;-)


Cheers,
simon


  parent reply	other threads:[~2023-04-04 11:52 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-07  1:31 Google Summer of Code 2023 Inquiry Spencer Skylar Chan
2023-03-11 13:32 ` Simon Tournier
2023-03-14 10:10   ` Simon Tournier
2023-03-22 17:41   ` Spencer Skylar Chan
2023-03-22 18:19   ` Ricardo Wurmus
2023-03-22 21:44     ` Spencer Skylar Chan
2023-03-23  7:58       ` Ricardo Wurmus
2023-03-30 23:27         ` Spencer Skylar Chan
2023-03-31  0:52           ` Kyle
2023-03-24 18:59       ` Kyle
2023-03-30 23:22         ` Spencer Skylar Chan
2023-03-31 15:15           ` Kyle
2023-04-04  0:41             ` Spencer Skylar Chan
2023-04-04  6:29               ` Kyle
2023-04-04  8:59               ` Simon Tournier [this message]
2023-04-04 14:32                 ` Kyle
2023-04-04 17:15                   ` Simon Tournier
  -- strict thread matches above, loose matches on Subject: below --
2023-03-08  2:33 Spencer Skylar Chan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86ttxwvx8q.fsf@gmail.com \
    --to=zimon.toutoune@gmail.com \
    --cc=guix-devel@gnu.org \
    --cc=kyle@posteo.net \
    --cc=rekado@elephly.net \
    --cc=schan12@terpmail.umd.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).