Publishing with Lzip

unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Publishing with Lzip
@ 2019-03-10 18:24 Pierre Neidhardt
  2019-03-12 13:19 ` Ludovic Courtès
  0 siblings, 1 reply; 5+ messages in thread
From: Pierre Neidhardt @ 2019-03-10 18:24 UTC (permalink / raw)
  To: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1694 bytes --]

Hi,

I've just sent a patch of a first draft for the Lzip bindings:

  http://debbugs.gnu.org/cgi/bugreport.cgi?bug=34807

Switching from Gzip (Zlib) to Lzip would save up to some 50% disk usage
for substitutes which would greatly benefit the servers and help
reducing the bandwidth usage.

A few things are missing in the patch and I would probably need some help:

- The %liblz variable in guix/config.scm.in is not set dynamically by
  autoconf.
  This is because lzlib does not have a pkg-config entry.  I don't know
  much about autoconf, so if someone knows how to do this properly...

- I'm still a bit confused about how the API is supposed to work.  There
  is a lz-(de)compress-finish function which is supposed to be called
  according to the examples but the manual does not really says why.  If
  I call it, the tests fail.

- I'm also not 100% I'm doing the right thing with the encoder/decoder:
  should we write everything first, then read as much as we can?  Or
  chain write-read calls like in bbexample.c?

- I'm not 100% sure either that the terminating chunk will always be
  compressed / decompressed.  (It works in the test though.)  I don't
  really understand how Lzlib handles that part.

- How can I map between C enums and Guile with dynamic FFI?  This would
  be useful to have improve error messages.

- lzlib.scm is not used for publishing in the patch.  Will do that
  later.  What are the strategies for transitioning from .gz to .lz?  I
  suggest the following:

  - On publishing, replace .gz with .lz compression.
  - When extracting, check the type and call the appropriate format decompressor.

Feedback welcome!

--
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Publishing with Lzip
  2019-03-10 18:24 Publishing with Lzip Pierre Neidhardt
@ 2019-03-12 13:19 ` Ludovic Courtès
  2019-03-12 13:39   ` Pierre Neidhardt
  0 siblings, 1 reply; 5+ messages in thread
From: Ludovic Courtès @ 2019-03-12 13:19 UTC (permalink / raw)
  To: Pierre Neidhardt; +Cc: guix-devel

Hi Pierre,

Pierre Neidhardt <mail@ambrevar.xyz> skribis:

> I've just sent a patch of a first draft for the Lzip bindings:
>
>   http://debbugs.gnu.org/cgi/bugreport.cgi?bug=34807
>
> Switching from Gzip (Zlib) to Lzip would save up to some 50% disk usage
> for substitutes which would greatly benefit the servers and help
> reducing the bandwidth usage.

Woohoo, awesome!

> A few things are missing in the patch and I would probably need some help:
>
> - The %liblz variable in guix/config.scm.in is not set dynamically by
>   autoconf.
>   This is because lzlib does not have a pkg-config entry.  I don't know
>   much about autoconf, so if someone knows how to do this properly...

I can give it a spin.

> - I'm still a bit confused about how the API is supposed to work.  There
>   is a lz-(de)compress-finish function which is supposed to be called
>   according to the examples but the manual does not really says why.  If
>   I call it, the tests fail.
>
> - I'm also not 100% I'm doing the right thing with the encoder/decoder:
>   should we write everything first, then read as much as we can?  Or
>   chain write-read calls like in bbexample.c?
>
> - I'm not 100% sure either that the terminating chunk will always be
>   compressed / decompressed.  (It works in the test though.)  I don't
>   really understand how Lzlib handles that part.

It’d be great to gain some confidence about these things.  :-) I haven’t
looked at the patch yet, but if you haven’t done it yet, I’d suggest
having tests like the one in tests/zlib.scm (compress and decompress a
bytevector of a random size with random contents), and you could
possibly perform more “stressful” tests manually as well (try to
compress/decompress tarballs, etc.)

I’d also recommend to re-read the API doc in the headers or whatever.
IME these APIs are very tricky to use and one has to pay attention to
every small detail.

> - How can I map between C enums and Guile with dynamic FFI?  This would
>   be useful to have improve error messages.

According to the C standard an enum is an ‘int’.  So mapping them is
just a matter of producing/consuming ints.  The values of the enum start
from 0 and are incremented by 1 from then on, unless specific values are
provided.

> - lzlib.scm is not used for publishing in the patch.  Will do that
>   later.  What are the strategies for transitioning from .gz to .lz?  I
>   suggest the following:
>
>   - On publishing, replace .gz with .lz compression.
>   - When extracting, check the type and call the appropriate format decompressor.

The strategy I think would be to first have the complete tool chain in
Guix, that is support in ‘guix substitute’ and ‘guix publish’.

We won’t be able to change our servers to produce lzip overnight,
because old instances of ‘guix substitute’ wouldn’t be able to consume
it.

Perhaps one option would be to allow ‘guix publish’ to produce both gzip
and lzip, which we’d use during some transition period.  The difficulty
would be that narinfos currently provide just one URL for the nar, so
we’d need to either provide several URLs, or provide the right URL based
on some appropriate HTTP request header.  Let’s focus on the rest for
now.  :-)

Thank you!

Ludo’.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Publishing with Lzip
  2019-03-12 13:19 ` Ludovic Courtès
@ 2019-03-12 13:39   ` Pierre Neidhardt
  2019-03-13 14:43     ` Ludovic Courtès
  0 siblings, 1 reply; 5+ messages in thread
From: Pierre Neidhardt @ 2019-03-12 13:39 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1581 bytes --]


> I can give it a spin.

Thanks!

> It’d be great to gain some confidence about these things.  :-) I haven’t
> looked at the patch yet, but if you haven’t done it yet, I’d suggest
> having tests like the one in tests/zlib.scm (compress and decompress a
> bytevector of a random size with random contents), and you could
> possibly perform more “stressful” tests manually as well (try to
> compress/decompress tarballs, etc.)

I've copied your test for zlib, it passes! :)

> I’d also recommend to re-read the API doc in the headers or whatever.
> IME these APIs are very tricky to use and one has to pay attention to
> every small detail.

I read the manual too many times.  The headers are not documented.  The examples
don't tell us more about the API.

I might be too inexperienced in the area, so maybe you or someone else could
have a look at the manual.

Else we could contact the maintainer and ask directly :D

> According to the C standard an enum is an ‘int’.  So mapping them is
> just a matter of producing/consuming ints.  The values of the enum start
> from 0 and are incremented by 1 from then on, unless specific values are
> provided.

My question was whether it's possible to have the mapping done "symbolically."
In C, you would match error values again the symbols of the enum, not against
the number.  So if we map the error numbers manually in Guile, it would break
whenever the API updates the enum.

Maybe I'm just being overly picky here :p

Cheers!

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Publishing with Lzip
  2019-03-12 13:39   ` Pierre Neidhardt
@ 2019-03-13 14:43     ` Ludovic Courtès
  2019-03-13 15:31       ` Pierre Neidhardt
  0 siblings, 1 reply; 5+ messages in thread
From: Ludovic Courtès @ 2019-03-13 14:43 UTC (permalink / raw)
  To: Pierre Neidhardt; +Cc: guix-devel

Hi!

Pierre Neidhardt <mail@ambrevar.xyz> skribis:

>> I’d also recommend to re-read the API doc in the headers or whatever.
>> IME these APIs are very tricky to use and one has to pay attention to
>> every small detail.
>
> I read the manual too many times.  The headers are not documented.  The examples
> don't tell us more about the API.
>
> I might be too inexperienced in the area, so maybe you or someone else could
> have a look at the manual.
>
> Else we could contact the maintainer and ask directly :D

Well, we’ll see!

>> According to the C standard an enum is an ‘int’.  So mapping them is
>> just a matter of producing/consuming ints.  The values of the enum start
>> from 0 and are incremented by 1 from then on, unless specific values are
>> provided.
>
> My question was whether it's possible to have the mapping done "symbolically."
> In C, you would match error values again the symbols of the enum, not against
> the number.  So if we map the error numbers manually in Guile, it would break
> whenever the API updates the enum.
>
> Maybe I'm just being overly picky here :p

Indeed.  :-)

The funny thing with the FFI is that it gives “a whole bunch of new
flexibility” to shoot yourself in the foot.  So for enums, while you
could do something fancy to extract the values from the headers (using
nyacc’s ffi-helper, or by running the C compiler at macro-expansion
time), what I would recommend is to just grab them once and for all.
:-)

In practice that works well: C library writers have an incentive to keep
ABI compatibility, so they’ll rarely change enum values.  This is
especially true for a library like this one whose API is probably set in
stone given that the scope of its functionality is well-defined.  We did
that for example in Guile-Gcrypt and Guile-Git and everything works
fine.

That also means that it’s a good idea to have unit tests that exercise
all the bindings.

HTH!

Ludo’.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Publishing with Lzip
  2019-03-13 14:43     ` Ludovic Courtès
@ 2019-03-13 15:31       ` Pierre Neidhardt
  0 siblings, 0 replies; 5+ messages in thread
From: Pierre Neidhardt @ 2019-03-13 15:31 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 77 bytes --]

Thanks for clarifying this!

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-03-13 15:41 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-10 18:24 Publishing with Lzip Pierre Neidhardt
2019-03-12 13:19 ` Ludovic Courtès
2019-03-12 13:39   ` Pierre Neidhardt
2019-03-13 14:43     ` Ludovic Courtès
2019-03-13 15:31       ` Pierre Neidhardt

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).