unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Bengt Richter <bokr@bokr.com>
To: Efraim Flashner <efraim@flashner.co.il>
Cc: guix-devel@gnu.org
Subject: Re: [BLOG] rust blog post
Date: Tue, 26 Nov 2019 04:26:43 -0800	[thread overview]
Message-ID: <20191126122643.GA116467@PhantoNv4ArchGx.localdomain> (raw)
In-Reply-To: <20191126102737.GK1124@E5400>

Hi Guix,
In case someone likes a narrower style:

On +2019-11-26 12:27:37 +0200, Efraim Flashner wrote:
> Hopefully this is better. I added a new line between each paragraph
> 
> On Tue, Nov 26, 2019 at 10:58:41AM +0100, Pierre Neidhardt wrote:
> > I think the attachment broke the formatting of the file (there is no
> > paragraph break).  Could you resend it?
> > 
> 
> -- 
> Efraim Flashner   <efraim@flashner.co.il>   אפרים פלשנר
> GPG key = A28B F40C 3E55 1372 662D  14F7 41AA E7DC CA3D 8351
> Confidentiality cannot be guaranteed on emails sent or received unencrypted

    It's easy to think of Rust as a new programming language but
    it has already been around for five years. Rust has made  it
    past it's 1.0 release and  the compiler is written in  Rust.
    We even  have  mrustc  to  act  as  a  secondary  method  to
    bootstrap  new  Rust  releases   without  falling  back   to
    downloading precompiled  tarballs. So  how is  the state  of
    Rust in Guix today?

    Truthfully, Rust  in Guix  could  be better.  The  developer
    story  for  Rust  is  pretty  straightforward:  write   your
    program, declare your dependencies in a Cargo.toml file, and
    ```cargo foo```  will  figure  out  your  dependency  chain.
    ```cargo build```  will download  any missing  dependencies,
    even using  a  cache  directory  to  reduce  downloads,  and
    compile the bits of the dependencies that are needed.

    But what about for distro maintainers?

    Obviously we can't download dependencies at build time, they
    need to  be packaged  ahead  of time.  So we  package  those
    dependencies. But wait, those dependencies have dependencies
    that are needed, and those  ones too. It's dependencies  all
    the way down,  hidden in  5 years  of iterative  development
    that we're late to the party to, trying to capture snapshots
    in time  where specific  versions of  libraries built  using
    previous generations.  All  this all  the  way back  to  the
    beginning, whenever that is.

    Obviously humans are prone to errors, so to work around this
    while  packaging  Rust  crates  Guix  has  effectively   two
    importers for  crates,  one  that  will  import  a  specific
    version and list it's dependencies, and one that can take  a
    crate and  recursively  import  all  the  packages  that  it
    depends on.  Currently  some work  is  needed to  allow  the
    recursive importer to interpret version numbers, but for now
    it works quite well.

    Taking a break from Rust for a moment, let's look at some of
    the other languages that  are packaged. Packages written  in
    C/C++, processed with autotools or  cmake or meson, are  the
    easiest. Dependencies are declared, source code is provided,
    and there's  a clear  distinction  between source  code  and
    compiled binary; source code is for hacking on, binaries are
    for executing. The closest to a middle ground are  libraries
    which allow programs to use features from other programs. In
    order to  use a  package, all  of its  dependencies must  be
    packaged and the libraries linked.

    Taking  a  look  at  the  other  end  we  have   Javascript.
    Javascript is source code, it's ready to be read and  hacked
    on. Javascript is already ready to be run, therefor it  must
    be a binary. Its... both? Javascript libraries leave  distro
    maintainers in  a  difficult position.  Building  Javascript
    ends up in the same problem  as we saw with Rust,  recursive
    dependencies all the way down, iterative versions  depending
    on previous ones,  and a misty  past from whence  everything
    sprang forth, which must be  recreated in order to bring  us
    back to the present day. But there's more difficulty,  often
    even after a 'build' phase has been run and tests have  been
    run on Javascript we're left with unchanged code. Except now
    it's no longer  source, it's  a binary...  or something.  So
    just what did we build and test?

    We can worry about Javascript another time, Rust has a clear
    boundary between source code and binaries.

    So how about python? Python is a scripting language and  can
    be run without being compiled,  but it also can be  compiled
    (pre-interpreted?) to bytecode and installed either  locally
    or globally.  That  leaves us  with  source code  which  can
    double as  a  binary, and  a  bytecode which  is  clearly  a
    binary. Given these  two states, we  declare the  uncompiled
    version as  source code,  ignore that  it can  be run  as  a
    script except when testing the code, and we never return  to
    second-guess ourselves.

    How about Go? Go is  another language that defies  packaging
    efforts, primarily because build instructions often make use
    of the HEAD of other  git branches, not tagged and  released
    versions. That  the  names of  the  libraries are  long  and
    cumbersome is  mostly a  secondary issue.  On the  developer
    side a binary  is a  ```go build``` away.  Go will  download
    missing  source  and  compile  libraries  as  needed.  On  a
    packager side the  libraries are carefully  gathered one  by
    one,  precompiled,  and  placed  carefully  in  a  directory
    hierarchy for use  in future  builds. What could  be a  long
    build of a program is replaced by an intermediate series  of
    packages where libraries are pre-compiled, and at each stage
    only the new code has to be compiled.

    For all except the  distro maintainer, the similarities  are
    strong between Rust and Go.  In both cases dependencies  are
    downloaded as part of the build process, there's a cache for
    the downloaded sources and the compiled libraries, and build
    artifacts can  be  reused between  different  programs  with
    overlapping dependencies. For the distro maintainer many  of
    these similarities are thrown out. Dependencies are packaged
    ahead of time and previously packaged libraries is literally
    a cache. Libraries  can be reused  for other packages,  yes,
    but for Rust they're not.

    Why not? If they're already compiled why not reuse them?

    Previously we've discussed source code and compiled binaries
    (or  libraries),  but  in  Rust  there  are  two  types   of
    libraries.  There   are  dynamic   libraries,  packaged   as
    ```libfoo.so```, and there are  Rust libraries, packaged  as
    ```libfoo.rlib```  or  ```libfoo-MAGICHASH.rlib```.  When  a
    Rust package declares  a dependency  on a  Rust library,  it
    doesn't declare a dependency on the whole library but rather
    just on the parts that it needs. This means that we can  get
    away with packaging only a portion of the dependent library,
    or the library  with only some  of its features  or its  own
    dependencies. When compiling a  final binary, a Rust  binary
    doesn't link to  an rlib,  it takes  just the  part that  it
    needs and incorporates it into the binary. As far as package
    maintainers are  concerned,  this  isn't  ideal  but  it  is
    something we can live with,  we already have this case  with
    static libraries from other languages. If we were to compile
    the binary manually the  command would be ```rustc  --binary
    foo --extern bar=/path/to/libbar.rlib```  and we'd  continue
    on. However, when bar depends  on baz, the similar  command,
    ```rust --library  bar --extern  baz=/path/to/libbaz.rlib```
    _doesn't_ link libbaz to libbar. This leaves us in a pickle;
    we know which libraries we need but we're unable to  compile
    them individually  and build  them up  iteratively until  we
    reach the binary endgoal.

    One of our packaged Rust programs, rust-cbindgen, is used by
    Icecat. Rust-cbindgen declares 8  (TODO: check this  number)
    dependencies. When  run  outside of  the  build  environment
    ```cargo build``` downloads a total of 58 (TODO: check  this
    number) packages, compiles them  and produces a binary.  Our
    recursive importer created more than 300 new packages before
    it was told to stop. Returning to our build process for rust
    libraries, since we couldn't link one rlib to another  rlib,
    we opted to compile  one rlib and then  place its source  in
    the build directory of the next one where it was recompiled.
    Baz would be built, then baz's source would be put in  bar's
    vendor directory where  baz and  bar would  be built.  After
    this baz's and bar's  sources would be  put in foo's  vendor
    directory, where all  three would be  compiled. This  sounds
    like Go, except that we're throwing away all the results  of
    our builds each time we start a new package.

    Since we  were  just copying  the  sources from  package  to
    package, the  simplest solution  was  to consider  the  Rust
    dependants as shared  sources and not  as shared  libraries.
    Yes,  the  same  source  would  be  used  between   multiple
    programs, but each one package  already only took the  small
    portion of the shared source that it needed so there was  no
    benefit to  compiling  the  entire package  ahead  of  time,
    especially with the  mounting recursive dependencies,  who's
    compiled libraries were being thrown away anyway.

    Rust-cbindgen ships with a Cargo.toml listing 8  dependants.
    It  also   ships  with   a  Cargo.lock,   detailing  the   8
    dependencies and  the  bits  of  other  libraries  that  are
    needed.  By  packing  the  sources  of  the  58   enumerated
    libraries and placing them in the vendor directory where the
    necessary parts could be compiled we ended at the same place
    we were headed anyway; only the sources were propagated from
    package build  to package  build, only  the source  was  the
    relevant part, only the source is shared.

-- 
Regards,
Bengt Richter

  reply	other threads:[~2019-11-26 12:27 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-25 10:30 [BLOG] rust blog post Efraim Flashner
2019-11-26  9:58 ` Pierre Neidhardt
2019-11-26 10:27   ` Efraim Flashner
2019-11-26 12:26     ` Bengt Richter [this message]
2019-11-26 13:01       ` Pierre Neidhardt
2019-11-27  0:19         ` Bengt Richter
2019-11-27  9:38           ` Efraim Flashner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191126122643.GA116467@PhantoNv4ArchGx.localdomain \
    --to=bokr@bokr.com \
    --cc=efraim@flashner.co.il \
    --cc=guix-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).