unofficial mirror of guix-science@gnu.org 
 help / color / mirror / Atom feed
* User & Developer Meetup on Sept. 27th: quick wrap-up
@ 2021-09-28  7:22 Simon Tournier
  2021-10-04 12:16 ` Ludovic Courtès
  0 siblings, 1 reply; 2+ messages in thread
From: Simon Tournier @ 2021-09-28  7:22 UTC (permalink / raw)
  To: guix-science

Dear,

On this Monday 27th, ~10 people met ~2h to map out actions for the
coming year about Guix in scientific context from reproducible research
to high-performance computing. Here a quick wrap-up!

Do not hesitate to drop an email with something you would like to see
or feel free to join #guix-hpc on irc.libera.chat to discuss one
specific item. Or please go ahead and help to make it happen. :-)


* Organizing training sessions and workshops

  + we need more!
  + https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/talks
    how do we give more visibility of this material already there?
  + sharing material à la Software Carpentry
    https://github.com/swcarpentry/swcarpentry
  + "format" (package?) to easily share this material
  + sharing reading lists
    (the idea behind is to have a weekly/monthly/? "newsletter")
  + PRACE-like EU training sessions 
    (Season [summer, winter, etc.] School)
  + outreach effort
  + Libraries (jackhill): for archival as well as reproducing library
practices


* Funding opportunities

  + EU project participation and grants to fund specific tasks
  + BIMSB aims to secure funding for PiGx, which uses Guix and Guile.
    There is a good chance to hire a person to hack on this in the
    coming year
    http://bioinformatics.mdc-berlin.de/pigx/
  + research topics
  + funding a few months of development/integration work
    (Software Heritage)
  + "keeping in touch", sharing opportunities
    (share call for fundings on topics: Repro on HPC, Bioinfo, etc.)
  + join GSoC or other organizations
  + run mentoring programs under our own organization ?


* Long-term archival using Software Heritage and Disarchive

  + status: "guix lint -c archival" (git-fetch)
    and https://guix.gnu.org/sources.json (url-fetch)
    missing tools to archive svn-fetch (and hg-fetch and minor others:
    CVS, etc.)
  + Disarchive can be used to "rebuild" tarballs 
    from content (SWH) and metadata (disarchive-DB)
    https://git.ngyro.com/disarchive
    PoC: https://git.ngyro.com/disarchive-db/
  + Data Service or Cuirass (Berlin)?
  + Timothy Sample started to create a Disarchive database 
    for previous Guix releases
  + left to be done:
     + build/maintain/publish the Disarchive database
     + archive the database
    (-> coordinate with Software Heritage)
  + metrics: what's the archive coverage on Software Heritage?
    what's the coverage of the Disarchive database?
  + want to contribute? email Simon Tournier or mailing list


* Citing software using Software Heritage IDs and Guix

  + ReScience has been using SWHIDs for software submitted
    (submitters must provide the SWHID, 
     by just clicking on the SWH web interface)
  + "guix show --format=bibtex" to produce a biblatex software style
    https://www.ctan.org/tex-archive/macros/latex/contrib/biblatex-contrib/biblatex-software
  + ideal goal: feed Guix with a BibTeX snippet so it starts deploying
it
    + a command to export the state and a list of packages
      which could be included to a paper as a citation
    + another command to import this citation and deploy again


* “Converting” reproducible/active papers to use Guix

  + Example: https://rescience.github.io/bibliography/Courtes_2020.html
  + write "source-code-to-PDF" (PDF or document in a broad sense)
    + pipelines automated with Guix for other papers out there
  + task: find candidate papers that can be automated
    (example: the ReScience collection)
  + task: organize a hackathon to work collectively on papers?


* Packaging machine learning frameworks

  + status: we have Tensorflow 1.9, Tensorflow-lite 2.x, PyTorch,
scikit-learn
  + we are stuck with Tensorflow 1.9
    because that's the last version to provide a CMake build system
  + Tensorflow blocker: Bazel build system 
    (in Java, cannot be built from source)
    + Debian package?
https://sources.debian.org/src/bazel-bootstrap/3.5.1+ds-3/debian/control/
    + Ricardo looked at Bazel-to-CMake converters
      - cons: not good enough for Tensorflow
    + idea: use Bazel to produce a "degenerate" build system
    + task: package more java package dependencies (for Bazel)
   + Required reading:
https://hpc.guix.info/blog/2021/09/whats-in-a-package/
   + tensorflow-lite still needs its Python bindings
     (this should not be too difficult)
     - cons: tensorflow-lite is "not very useful"
  + related question: how do we package ML applications?
    what's the source: the trained model, or the data set?
    how do we distribute huge data sets?


* Packaging (or not packaging) datasets

  + relevant for ML models but also many other domains
  + idea: establish contacts with communities working on this question
    + https://www.datalad.org/
    + https://www.pachyderm.com/ data management
    + git-annex


* Programming with GPUs

  + NVIDIA has a monopoly, CUDA available in the guix-hpc-non-free
channel at INRIA
  + but this very much goes against our goal of building a transparent
software stack
  + how could we make it easier to support "quick-and-dirty" packaging?
    + binary-build-system from the nonguix channel
    + example: Zotero
    + idea: "guix environment --fhs" to provide an FHS-compliant file
tree
      + Possible starting point:
https://gitlab.com/pkill-9/guix-packages-free/blob/master/pkill9/services/fhs.scm

* Julia packaging and importer

   + Efraim prepared 100+ packages
   + Documenter.jl (JS stuff) required by Flux (ML) and many many
     + patches on the mailing list for package without JS support
   + Simon started writing an importer
     + problem: information in the package registry is hard to use
as-is;
       a service by Julia Computing, Inc. helps a bit
       + https://github.com/JuliaRegistries/General/tree/master/F/Flux
         requires to parse a lot of TOML and synopsis+description is
not there
       + https://juliahub.com/docs/Flux/QdkVy/0.11.6/pkg.json
         API with JSON containing almost everything
     + policy question: can Guix (the importer?) rely on this service?


* CPU micro-architecture support with function multi-versioning

   + function multiversioning, like in Clear Linux
   + benchmarking to see if it even provides a benefit
   + prototype:
https://gitlab.inria.fr/guix-hpc/function-multi-versioning
   + task: find candidate code for automatic FMV + benchmarks
   + gcc-11 supports different x86_64 "micro-architectures"
     + https://gcc.gnu.org/gcc-11/changes.html
     + GCC now supports micro-architecture levels
       defined in the x86-64 psABI
       via -march=x86-64-v2, -march=x86-64-v3 and -march=x86-64-v4.


* Relocatable pack execution on clusters that lack user namespaces

  + fixing issues when running MPI applications using the "fakechroot"
execution engine on big clusters


* Run the tutorial FEniCSX and Firedrake cases using Guix-Jupyter

   + main tasks: package FEniCSX and Firedrake (+ dependencies, mostly
Python)
   + initial work to happen on a channel
     Contact Paul Garlick


* Hosting a list of scientific channels at hpc.guix.info/channels

   + provide substitutes via Cuirass
   + discussion about free software and binaries substitutes?


We plan to organize a hackathon day soon* to tackle and make progress
on one specific item. Stay tuned!

*soon: not fixed yet, surely on November or December.

All the best


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-10-04 12:28 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-28  7:22 User & Developer Meetup on Sept. 27th: quick wrap-up Simon Tournier
2021-10-04 12:16 ` Ludovic Courtès

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).