all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Efraim Flashner <efraim@flashner.co.il>
To: Pierre Neidhardt <mail@ambrevar.xyz>
Cc: guix-devel@gnu.org
Subject: Re: [BLOG] rust blog post
Date: Tue, 26 Nov 2019 12:27:37 +0200	[thread overview]
Message-ID: <20191126102737.GK1124@E5400> (raw)
In-Reply-To: <87a78jm01a.fsf@ambrevar.xyz>


[-- Attachment #1.1: Type: text/plain, Size: 470 bytes --]

Hopefully this is better. I added a new line between each paragraph

On Tue, Nov 26, 2019 at 10:58:41AM +0100, Pierre Neidhardt wrote:
> I think the attachment broke the formatting of the file (there is no
> paragraph break).  Could you resend it?
> 

-- 
Efraim Flashner   <efraim@flashner.co.il>   אפרים פלשנר
GPG key = A28B F40C 3E55 1372 662D  14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted

[-- Attachment #1.2: rust-in-guix --]
[-- Type: text/plain, Size: 8353 bytes --]

It's easy to think of Rust as a new programming language but it has already been around for five years.  Rust has made it past it's 1.0 release and the compiler is written in Rust. We even have mrustc to act as a secondary method to bootstrap new Rust releases without falling back to downloading precompiled tarballs. So how is the state of Rust in Guix today?

Truthfully, Rust in Guix could be better. The developer story for Rust is pretty straightforward: write your program, declare your dependencies in a Cargo.toml file, and ```cargo foo``` will figure out your dependency chain. ```cargo build``` will download any missing dependencies, even using a cache directory to reduce downloads, and compile the bits of the dependencies that are needed.

But what about for distro maintainers?

Obviously we can't download dependencies at build time, they need to be packaged ahead of time. So we package those dependencies. But wait, those dependencies have dependencies that are needed, and those ones too. It's dependencies all the way down, hidden in 5 years of iterative development that we're late to the party to, trying to capture snapshots in time where specific versions of libraries built using previous generations. All this all the way back to the beginning, whenever that is.

Obviously humans are prone to errors, so to work around this while packaging Rust crates Guix has effectively two importers for crates, one that will import a specific version and list it's dependencies, and one that can take a crate and recursively import all the packages that it depends on. Currently some work is needed to allow the recursive importer to interpret version numbers, but for now it works quite well.

Taking a break from Rust for a moment, let's look at some of the other languages that are packaged. Packages written in C/C++, processed with autotools or cmake or meson, are the easiest. Dependencies are declared, source code is provided, and there's a clear distinction between source code and compiled binary; source code is for hacking on, binaries are for executing. The closest to a middle ground are libraries which allow programs to use features from other programs. In order to use a package, all of its dependencies must be packaged and the libraries linked.

Taking a look at the other end we have Javascript. Javascript is source code, it's ready to be read and hacked on. Javascript is already ready to be run, therefor it must be a binary. Its... both? Javascript libraries leave distro maintainers in a difficult position. Building Javascript ends up in the same problem as we saw with Rust, recursive dependencies all the way down, iterative versions depending on previous ones, and a misty past from whence everything sprang forth, which must be recreated in order to bring us back to the present day. But there's more difficulty, often even after a 'build' phase has been run and tests have been run on Javascript we're left with unchanged code. Except now it's no longer source, it's a binary... or something. So just what did we build and test?

We can worry about Javascript another time, Rust has a clear boundary between source code and binaries.

So how about python? Python is a scripting language and can be run without being compiled, but it also can be compiled (pre-interpreted?) to bytecode and installed either locally or globally. That leaves us with source code which can double as a binary, and a bytecode which is clearly a binary. Given these two states, we declare the uncompiled version as source code, ignore that it can be run as a script except when testing the code, and we never return to second-guess ourselves.

How about Go? Go is another language that defies packaging efforts, primarily because build instructions often make use of the HEAD of other git branches, not tagged and released versions. That the names of the libraries are long and cumbersome is mostly a secondary issue. On the developer side a binary is a ```go build``` away. Go will download missing source and compile libraries as needed. On a packager side the libraries are carefully gathered one by one, precompiled, and placed carefully in a directory hierarchy for use in future builds. What could be a long build of a program is replaced by an intermediate series of packages where libraries are pre-compiled, and at each stage only the new code has to be compiled.

For all except the distro maintainer, the similarities are strong between Rust and Go. In both cases dependencies are downloaded as part of the build process, there's a cache for the downloaded sources and the compiled libraries, and build artifacts can be reused between different programs with overlapping dependencies. For the distro maintainer many of these similarities are thrown out. Dependencies are packaged ahead of time and previously packaged libraries is literally a cache. Libraries can be reused for other packages, yes, but for Rust they're not.

Why not? If they're already compiled why not reuse them?

Previously we've discussed source code and compiled binaries (or libraries), but in Rust there are two types of libraries. There are dynamic libraries, packaged as ```libfoo.so```, and there are Rust libraries, packaged as ```libfoo.rlib``` or ```libfoo-MAGICHASH.rlib```. When a Rust package declares a dependency on a Rust library, it doesn't declare a dependency on the whole library but rather just on the parts that it needs. This means that we can get away with packaging only a portion of the dependent library, or the library with only some of its features or its own dependencies. When compiling a final binary, a Rust binary doesn't link to an rlib, it takes just the part that it needs and incorporates it into the binary. As far as package maintainers are concerned, this isn't ideal but it is something we can live with, we already have this case with static libraries from other languages. If we were to compile the binary manually the command would be ```rustc --binary foo --extern bar=/path/to/libbar.rlib``` and we'd continue on. However, when bar depends on baz, the similar command, ```rust --library bar --extern baz=/path/to/libbaz.rlib``` _doesn't_ link libbaz to libbar. This leaves us in a pickle; we know which libraries we need but we're unable to compile them individually and build them up iteratively until we reach the binary endgoal.

One of our packaged Rust programs, rust-cbindgen, is used by Icecat. Rust-cbindgen declares 8 (TODO: check this number) dependencies. When run outside of the build environment ```cargo build``` downloads a total of 58 (TODO: check this number) packages, compiles them and produces a binary. Our recursive importer created more than 300 new packages before it was told to stop. Returning to our build process for rust libraries, since we couldn't link one rlib to another rlib, we opted to compile one rlib and then place its source in the build directory of the next one where it was recompiled. Baz would be built, then baz's source would be put in bar's vendor directory where baz and bar would be built. After this baz's and bar's sources would be put in foo's vendor directory, where all three would be compiled. This sounds like Go, except that we're throwing away all the results of our builds each time we start a new package.

Since we were just copying the sources from package to package, the simplest solution was to consider the Rust dependants as shared sources and not as shared libraries. Yes, the same source would be used between multiple programs, but each one package already only took the small portion of the shared source that it needed so there was no benefit to compiling the entire package ahead of time, especially with the mounting recursive dependencies, who's compiled libraries were being thrown away anyway.

Rust-cbindgen ships with a Cargo.toml listing 8 dependants. It also ships with a Cargo.lock, detailing the 8 dependencies and the bits of other libraries that are needed. By packing the sources of the 58 enumerated libraries and placing them in the vendor directory where the necessary parts could be compiled we ended at the same place we were headed anyway; only the sources were propagated from package build to package build, only the source was the relevant part, only the source is shared.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2019-11-26 10:28 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-25 10:30 [BLOG] rust blog post Efraim Flashner
2019-11-26  9:58 ` Pierre Neidhardt
2019-11-26 10:27   ` Efraim Flashner [this message]
2019-11-26 12:26     ` Bengt Richter
2019-11-26 13:01       ` Pierre Neidhardt
2019-11-27  0:19         ` Bengt Richter
2019-11-27  9:38           ` Efraim Flashner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191126102737.GK1124@E5400 \
    --to=efraim@flashner.co.il \
    --cc=guix-devel@gnu.org \
    --cc=mail@ambrevar.xyz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.