From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp12.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms9.migadu.com with LMTPS id QANuJmj2K2SabgEASxT56A (envelope-from ) for ; Tue, 04 Apr 2023 12:05:28 +0200 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp12.migadu.com with LMTPS id gJ0bJmj2K2SNIwAAauVa8A (envelope-from ) for ; Tue, 04 Apr 2023 12:05:28 +0200 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id EF13112C6E for ; Tue, 4 Apr 2023 12:05:27 +0200 (CEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pjdWy-00042T-63; Tue, 04 Apr 2023 06:04:48 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pjdWv-0003xO-HI for guix-devel@gnu.org; Tue, 04 Apr 2023 06:04:45 -0400 Received: from mout02.posteo.de ([185.67.36.66]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pjdWr-0004UA-Pf for guix-devel@gnu.org; Tue, 04 Apr 2023 06:04:45 -0400 Received: from submission (posteo.de [185.67.36.169]) by mout02.posteo.de (Postfix) with ESMTPS id 6C9B0240283 for ; Tue, 4 Apr 2023 12:04:39 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1680602679; bh=XtVn76bi8hlqPZmMcTOjKtT3AhWoFa12XNXUnd/CqIs=; h=Date:From:To:CC:Subject:From; b=BFCK9/xU41uODQKFfM3MlP1jI/Z6sGcVHcytAexxU58IRFEUjQaFiAIyU8YNW7GKf 98i4dHQpwuM/2BkRsCKTsQR+/WPjtxSag/6ItU3frtwPsOQkLoci0NWlx+juyd6TL+ c+NMX2hsBCN6S5Qp8M0lzn7b/4FAmvN5hQqZVM5+jno2jmYFmiRzgN1hsgwCMofMEw vuMvmTpp0WhSwfs2LMZ60M8F9yiDtS0RUJ6dosAf6ciHVmCp8rx+yqaI2YSiv0xDlm vAbBdYFH77CIqdFe3FTCe68g8T+lPwmTVAAlTi3RwGDlVMRN3zvyX/tSZMHJksLbns HFczehHozDnlw== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4PrNfG1LMtz9rxN; Tue, 4 Apr 2023 12:04:37 +0200 (CEST) Date: Tue, 04 Apr 2023 06:29:00 +0000 From: Kyle To: Spencer Skylar Chan , Ricardo Wurmus , Simon Tournier CC: guix-devel@gnu.org Subject: Re: Google Summer of Code 2023 Inquiry In-Reply-To: <09755392-de37-c039-6b60-46310f6f4314@terpmail.umd.edu> References: <6d30ee7b-f1f0-9199-fea8-efd434c8611c@terpmail.umd.edu> <86sfeb9zx8.fsf@gmail.com> <87ttycir7r.fsf@elephly.net> <42aa5844-0769-e122-efd7-8a152070c71c@terpmail.umd.edu> <09755392-de37-c039-6b60-46310f6f4314@terpmail.umd.edu> Message-ID: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary=----5B01E9FGY9L3G066MNX7XDGAZ536HC Content-Transfer-Encoding: 7bit Received-SPF: pass client-ip=185.67.36.66; envelope-from=kyle@posteo.net; helo=mout02.posteo.de X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DATE_IN_PAST_03_06=1.592, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: guix-devel-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN ARC-Seal: i=1; s=key1; d=yhetil.org; t=1680602728; a=rsa-sha256; cv=none; b=sAEjRYPRAc+NwymxL7YmAQ/OiyDV/R+OFV24TwKd/oryw46K2K2PfFxLXZ/IIgSmIp+aoC aZ4I9BaX+Vxv1BzEIPD1N5eTJZrJ7VkhtE23hvL3L2Eb5KV5cy3anfftM8Vr/FlVi+9VeZ dHZduh6Z9frpFL5d0aUlMBDAzv9n3HhqWRtZ0msgGDjwe70mVaO3JqODGOtVY91LL7Ogjr 2fiiqfjEl5OhD+VV4oyQQMAUykGYCEXVnePL3+1j6JMVZiZANMXWNWJabgNWI/FJE27Oxp LU+PLTCW5jk272ysi9yzyLrbVi6hL9UxgJosLyWFS7wR/MlpFmZTL7UBTaxofA== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=posteo.net header.s=2017 header.b="BFCK9/xU"; dmarc=pass (policy=none) header.from=posteo.net; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1680602728; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=RBxvZh22zT8jn26DfUtGF9DRm34TIRCMf0U/ki5rt74=; b=SGfRUetjTpsASwPFTBqzNR+Anf832tpw7xTRJnDWvdUgwrzeIoaxKqAxsipHpFcLJE2zjo zMuSR+oXkxsKlkvRfL7Cm/xweFbZby5PQ85FhztolA9MLgZpyiQTenOVSJgSm4kVEi6sYW EKpiAJ7N8bR86CiJHvQGx6yX9C2MkXqrpp7dVD5ZOEDfc/StjRzgOJCaZrYXXCqupo3SeD 9i6TvM3REosynqfXTI7bhELv0QxyHkv1QFd7L6azoX58JL1GwA0bf+IKn5eL7a4dytmy4e dc4PawIu0ZP1ILCsNuiblLTuCU/KgRcWZPabCl3mFrkyZxoZ7epMKLGjIWZd3A== Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=posteo.net header.s=2017 header.b="BFCK9/xU"; dmarc=pass (policy=none) header.from=posteo.net; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Scanner: scn0.migadu.com X-Migadu-Spam-Score: -7.01 X-Spam-Score: -7.01 X-Migadu-Queue-Id: EF13112C6E X-TUID: nxkSPChoL0l3 ------5B01E9FGY9L3G066MNX7XDGAZ536HC Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi Spencer, Here is the documentation for the git commit-graph cache file=2E The autho= rs also made their own blog posts about it as well with a bit more explanat= ion=2E =3D> https://git-scm=2Ecom/docs/commit-graph =3D> https://devblogs=2Emicrosoft=2Ecom/devops/updates-to-the-git-commit-g= raph-feature/ Maybe it won't turn out to be needed=2E=2E=2E just thought it might help g= et you thinking=2E Please read all my suggestions from that perspective as = a reasonable default=2E I will have to defer to others for gauging the size of projects=2E I have = found as a rule there are always many more details to be considered than I = could have anticipated at the start of a project=2E That said I liked your = earlier stated plan of starting simple=2E Handling latest releases seems a = reasonable minimal viable product=2E Cheers, Kyle On April 3, 2023 8:41:53 PM EDT, Spencer Skylar Chan wrote: >Hi Kyle, > >On 3/31/23 11:15, Kyle wrote: >> I would expect most software versions to not be in Guix=2E Simon had me= ntioned that this is mostly what the guix-past repository is for=2E However= , some packages might be buried on some branch or some commit in some Guix = related git repository=2E It may be helpful to facilitate their discovery a= nd extraction for conda import=2E >>=20 >> Git has a newish binary file format for caching searches across commits= =2E Maybe it would be helpful to figure out how to parse this format (its d= ocumented) and index the data further using Xapian or a graph data structur= e (or tree sitter?) with the relevant metadata needed to find and efficient= ly extract scheme code and its dependencies? > >If the format is documented then this is possible, although I'm not super= familiar with these kinds of data structures=2E > >> You make an interesting point about compilation errors=2E It may more p= roductive to help researchers test for working satisfiable configurations a= s a more relaxed approach to having to specify the exact software version= =2E Maybe some "nearby" or newer version is packaged and that is enough to = successfully run a test suite? I'm imagining something between git bisect a= nd Guix's own package solver=2E > >Yes, we could have a variant of the solver that's more relaxed=2E It coul= d output multiple solutions so the user can inspect them and pick the best = one=2E > >> It might also be productive to add infrastructure to help scientists mo= re conveniently track and study their recent packaging experiments=2E Guix = will only become more useful the more packages which are already available= =2E Work which makes packaging more approachable by more people benefits ev= eryone=2E Perhaps you can think of other ideas in this direction? > >I'm not sure how "packaging experiments" are different from packaging sof= tware the usual way=2E I think making the importers easier to use and debug= would help, although that sounds outside the scope of the projects=2E > >Finally, would these projects be considered large or medium for the purpo= ses of GSOC? > >Thanks, >Skylar > >> On March 30, 2023 7:22:14 PM EDT, Spencer Skylar Chan wrote: >>> Hi Kyle, >>>=20 >>> On 3/24/23 14:59, Kyle wrote: >>>> I am a bit worried about your proposed project is too focused on repl= acing python with guile=2E I think the project would benefit more from maki= ng python users more comfortable productively using Guix tools in concert w= ith the tools they are already comfortable with=2E >>>=20 >>> Yes, I agree with you=2E Replacing Python with Guile is a much more am= bitious task and is not the highest priority here=2E >>>=20 >>>> I'm wondering if you might consider modifying your project goals towa= rd exploring how GWL might be enhanced so that it could better complement m= ore expressive language specific workflow tools like snakemake=2E I am also= personally interested in exploring such a facilities from the targets work= flow system in R as well=2E Alternatively, perhaps you could focus kn exten= ding the GWL with more features? >>>=20 >>> I would also be interested in extending GWL with more features, I will= follow up with this on the GWL mailing list=2E >>>=20 >>>> I agree that establishing an achievable scope within a short timeline= is crucial=2E The conda env importer idea would be quite an ambitious unde= rtaking by itself and would lead you towards thinking about some pretty int= eresting and impactful problems=2E >>>=20 >>> While it's a challenging project, it could be broken into smaller step= s: >>>=20 >>> 1=2E import packages by exact matching names only, without versioning= =2E >>> 2=2E extend `guix import` to have `guix import conda` to help with pac= kage names that do not match exactly, and to accelerate adoption of Conda p= ackages not in Guix >>> 3=2E match software version numbers when translating Conda packages to= Guix >>>=20 >>> What's currently undefined is the error handling: >>> - if a Conda package does not exist in Guix >>> - if the dependency graph is not solvable >>> - if compiling the environment fails (due to mismatching dependency ve= rsions) >>>=20 >>> I believe there are many satisfactory stopping points for successful c= ompletion within the timeline of the summer, which I hope to present with m= y proposal soon=2E >>>=20 >>> Thanks, >>> Skylar >>>=20 >>>>=20 >>>> On March 22, 2023 5:44:52 PM EDT, Spencer Skylar Chan wrote: >>>>=20 >>>> Hi Ricardo, >>>>=20 >>>> On 3/22/23 14:19, Ricardo Wurmus wrote: >>>>=20 >>>>=20 >>>> - Translating Snakemake to Guix Workflow Language (G= WL) >>>>=20 >>>>=20 >>>> Ricardo, maybe you would have some suggestions=2E :-) >>>>=20 >>>>=20 >>>> Oh, this looks interesting=2E Could you please elaborate on = the idea? >>>>=20 >>>> My idea is to take as input a Snakemake workflow file and eventu= ally output an equivalent GWL workflow file=2E >>>>=20 >>>> Currently, Snakemake workflows can be exported to CWL (Common Wo= rkflow Language): >>>>=20 >>>> https://snakemake=2Ereadthedocs=2Eio/en/stable/executing/interop= erability=2Ehtml >>>>=20 >>>> One approach could be to add CWL import/export capabilities to G= WL=2E Then Snakemake/GWL conversion would be a 2 step process, using CWL as= an intermediate step: >>>>=20 >>>> 1=2E Snakemake -> CWL >>>> 2=2E CWL -> GWL >>>>=20 >>>> However, CWL is not as expressive as Snakemake=2E There may be s= ome details that are lost from Snakemake workflows=2E >>>>=20 >>>> So a 1-step Snakemake/GWL transpiler could be interesting, as bo= th Snakemake/GWL use a domain-specific language inside a general purpose la= nguage (Python/Guile respectively)=2E There may be a possibility to achieve= more "accurate" translations between workflows=2E >>>>=20 >>>> Is this topic something that could fit into a summer project? >>>>=20 >>>=20 >>=20 > ------5B01E9FGY9L3G066MNX7XDGAZ536HC Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi Spencer,

Here is the documentation for t= he git commit-graph cache file=2E The authors also made their own blog post= s about it as well with a bit more explanation=2E

=3D> https://git-scm=2Ecom/docs/commit= -graph
=3D> https://devblogs=2Emicrosoft=2Eco= m/devops/updates-to-the-git-commit-graph-feature/

Maybe it won't= turn out to be needed=2E=2E=2E just thought it might help get you thinking= =2E Please read all my suggestions from that perspective as a reasonable de= fault=2E

I will have to defer to others for gauging the size of proj= ects=2E I have found as a rule there are always many more details to be con= sidered than I could have anticipated at the start of a project=2E That sai= d I liked your earlier stated plan of starting simple=2E Handling latest re= leases seems a reasonable minimal viable product=2E

Cheers,
Kyle<= br>




On April 3, 2023 8:41:53 = PM EDT, Spencer Skylar Chan <schan12@terpmail=2Eumd=2Eedu> wrote:
Hi Kyle,

On 3/31/23 11:15, Kyle = wrote:
I would expect = most software versions to not be in Guix=2E Simon had mentioned that this i= s mostly what the guix-past repository is for=2E However, some packages mig= ht be buried on some branch or some commit in some Guix related git reposit= ory=2E It may be helpful to facilitate their discovery and extraction for c= onda import=2E

Git has a newish binary file format for caching searc= hes across commits=2E Maybe it would be helpful to figure out how to parse = this format (its documented) and index the data further using Xapian or a g= raph data structure (or tree sitter?) with the relevant metadata needed to = find and efficiently extract scheme code and its dependencies?

If the format is documented then this is possible, although I'm not= super familiar with these kinds of data structures=2E

You make an interesting point about com= pilation errors=2E It may more productive to help researchers test for work= ing satisfiable configurations as a more relaxed approach to having to spec= ify the exact software version=2E Maybe some "nearby" or newer version is p= ackaged and that is enough to successfully run a test suite? I'm imagining = something between git bisect and Guix's own package solver=2E

Yes, we could have a variant of the solver that's more relaxed=2E It= could output multiple solutions so the user can inspect them and pick the = best one=2E

It migh= t also be productive to add infrastructure to help scientists more convenie= ntly track and study their recent packaging experiments=2E Guix will only b= ecome more useful the more packages which are already available=2E Work whi= ch makes packaging more approachable by more people benefits everyone=2E Pe= rhaps you can think of other ideas in this direction?

I= 'm not sure how "packaging experiments" are different from packaging softwa= re the usual way=2E I think making the importers easier to use and debug wo= uld help, although that sounds outside the scope of the projects=2E

= Finally, would these projects be considered large or medium for the purpose= s of GSOC?

Thanks,
Skylar

On March 30, 2023 7:22:14 PM EDT, Spencer Skylar Chan <= schan12@terpmail=2Eumd=2Eedu> wrote:
Hi Kyle,

On 3/24/23 14:59, Kyle wrote:
I am a bit worried about your propo= sed project is too focused on replacing python with guile=2E I think the pr= oject would benefit more from making python users more comfortable producti= vely using Guix tools in concert with the tools they are already comfortabl= e with=2E

Yes, I agree with you=2E Replacing Python wit= h Guile is a much more ambitious task and is not the highest priority here= =2E

I'm wondering i= f you might consider modifying your project goals toward exploring how GWL = might be enhanced so that it could better complement more expressive langua= ge specific workflow tools like snakemake=2E I am also personally intereste= d in exploring such a facilities from the targets workflow system in R as w= ell=2E Alternatively, perhaps you could focus kn extending the GWL with mor= e features?

I would also be interested in extending GWL= with more features, I will follow up with this on the GWL mailing list=2E<= br>
I agree that establ= ishing an achievable scope within a short timeline is crucial=2E The conda = env importer idea would be quite an ambitious undertaking by itself and wou= ld lead you towards thinking about some pretty interesting and impactful pr= oblems=2E

While it's a challenging project, it could be= broken into smaller steps:

1=2E import packages by exact matching n= ames only, without versioning=2E
2=2E extend `guix import` to have `guix= import conda` to help with package names that do not match exactly, and to= accelerate adoption of Conda packages not in Guix
3=2E match software v= ersion numbers when translating Conda packages to Guix

What's curren= tly undefined is the error handling:
- if a Conda package does not exist= in Guix
- if the dependency graph is not solvable
- if compiling the= environment fails (due to mismatching dependency versions)

I believ= e there are many satisfactory stopping points for successful completion wit= hin the timeline of the summer, which I hope to present with my proposal so= on=2E

Thanks,
Skylar


On March 22, 2023 5:44:52 PM EDT, Spencer Skylar Chan <s= chan12@terpmail=2Eumd=2Eedu> wrote:

Hi Ricardo,

= On 3/22/23 14:19, Ricardo Wurmus wrote:


- Trans= lating Snakemake to Guix Workflow Language (GWL)


Ri= cardo, maybe you would have some suggestions=2E :-)


Oh,= this looks interesting=2E Could you please elaborate on the idea?

= My idea is to take as input a Snakemake workflow file and eventually ou= tput an equivalent GWL workflow file=2E

Currently, Snakemake wo= rkflows can be exported to CWL (Common Workflow Language):

https://snakemake=2Ereadthedocs=2Eio/en/stable/executing/inte= roperability=2Ehtml <https://snakemake=2Ereadthed= ocs=2Eio/en/stable/executing/interoperability=2Ehtml>

On= e approach could be to add CWL import/export capabilities to GWL=2E Then Sn= akemake/GWL conversion would be a 2 step process, using CWL as an intermedi= ate step:

1=2E Snakemake -> CWL
2=2E CWL -> GWL
However, CWL is not as expressive as Snakemake=2E There may be s= ome details that are lost from Snakemake workflows=2E

So a 1-st= ep Snakemake/GWL transpiler could be interesting, as both Snakemake/GWL use= a domain-specific language inside a general purpose language (Python/Guile= respectively)=2E There may be a possibility to achieve more "accurate" tra= nslations between workflows=2E

Is this topic something that cou= ld fit into a summer project?




------5B01E9FGY9L3G066MNX7XDGAZ536HC--