* When substitute download + decompression is CPU-bound @ 2020-12-14 22:20 Ludovic Courtès 2020-12-14 22:29 ` Julien Lepiller ` (2 more replies) 0 siblings, 3 replies; 43+ messages in thread From: Ludovic Courtès @ 2020-12-14 22:20 UTC (permalink / raw) To: guix-devel Hi Guix! Consider these two files: https://ci.guix.gnu.org/nar/gzip/kfcrrl6p6f6v51jg5rirmq3q067zxih6-ungoogled-chromium-87.0.4280.88-0.b78cb92 https://ci.guix.gnu.org/nar/lzip/kfcrrl6p6f6v51jg5rirmq3q067zxih6-ungoogled-chromium-87.0.4280.88-0.b78cb92 Quick decompression bench: --8<---------------cut here---------------start------------->8--- $ du -h /tmp/uc.nar.[gl]z 103M /tmp/uc.nar.gz 71M /tmp/uc.nar.lz $ gunzip -c < /tmp/uc.nar.gz| wc -c 350491552 $ time lzip -d </tmp/uc.nar.lz >/dev/null real 0m6.040s user 0m5.950s sys 0m0.036s $ time gunzip -c < /tmp/uc.nar.gz >/dev/null real 0m2.009s user 0m1.977s sys 0m0.032s --8<---------------cut here---------------end--------------->8--- The decompression throughput (compressed bytes read in the first column, uncompressed bytes written in the second column) is: input | output gzip: 167 MiB/s | 52 MB/s lzip: 56 MiB/s | 11 MB/s Indeed, if you run this from a computer on your LAN: wget -O - … | gunzip > /dev/null you’ll find that wget caps at 50 M/s with gunzip, whereas with lunzip it caps at 11 MB/s. From my place I get a peak download bandwidth of 30+ MB/s from ci.guix.gnu.org, thus substitute downloads are CPU-bound (I can’t go beyond 11 M/s due to decompression). I must say it never occurred to me it could be the case when we introduced lzip substitutes. I’d get faster substitute downloads with gzip (I would download more but the time-to-disk would be smaller.) Specifically, download + decompression of ungoogled-chromium from the LAN completes in 2.4s for gzip vs. 7.1s for lzip. On a low-end ARMv7 device, also on the LAN, I get 32s (gzip) vs. 53s (lzip). Where to go from here? Several options: 0. Lzip decompression speed increases with compression ratio, but we’re already using ‘--best’ on ci. The only way we could gain is by using “multi-member archives” and then parallel decompression as done in plzip, but that’s probably not supported in lzlib. So we’re probably stuck here. 1. Since ci.guix.gnu.org still provides both gzip and lzip archives, ‘guix substitute’ could automatically pick one or the other depending on the CPU and bandwidth. Perhaps a simple trick would be to check the user/wall-clock time ratio and switch to gzip for subsequent downloads if that ratio is close to one. How well would that work? 2. Use Zstd like all the cool kids since it seems to have a much higher decompression speed: <https://facebook.github.io/zstd/>. 630 MB/s on ungoogled-chromium on my laptop. Woow. 3. Allow for parallel downloads (really: parallel decompression) as Julien did in <https://issues.guix.gnu.org/39728>. My preference would be #2, #1, and #3, in this order. #2 is great but it’s quite a bit of work, whereas #1 could be deployed quickly. I’m not fond of #3 because it just papers over the underlying issue and could be counterproductive if the number of jobs is wrong. Thoughts? Ludo’. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2020-12-14 22:20 When substitute download + decompression is CPU-bound Ludovic Courtès @ 2020-12-14 22:29 ` Julien Lepiller 2020-12-14 22:59 ` Nicolò Balzarotti 2020-12-15 10:40 ` Jonathan Brielmaier 2 siblings, 0 replies; 43+ messages in thread From: Julien Lepiller @ 2020-12-14 22:29 UTC (permalink / raw) To: guix-devel, Ludovic Courtès [-- Attachment #1: Type: text/plain, Size: 3502 bytes --] My proposed changes to allow for parallel download assume downloads are network-bound, so they can be separate from other jobs. If downloads are actually CPU-bound, then it has indeed no merit at all :) Le 14 décembre 2020 17:20:17 GMT-05:00, "Ludovic Courtès" <ludo@gnu.org> a écrit : >Hi Guix! > >Consider these two files: > >https://ci.guix.gnu.org/nar/gzip/kfcrrl6p6f6v51jg5rirmq3q067zxih6-ungoogled-chromium-87.0.4280.88-0.b78cb92 >https://ci.guix.gnu.org/nar/lzip/kfcrrl6p6f6v51jg5rirmq3q067zxih6-ungoogled-chromium-87.0.4280.88-0.b78cb92 > >Quick decompression bench: > >--8<---------------cut here---------------start------------->8--- >$ du -h /tmp/uc.nar.[gl]z >103M /tmp/uc.nar.gz >71M /tmp/uc.nar.lz >$ gunzip -c < /tmp/uc.nar.gz| wc -c >350491552 >$ time lzip -d </tmp/uc.nar.lz >/dev/null > >real 0m6.040s >user 0m5.950s >sys 0m0.036s >$ time gunzip -c < /tmp/uc.nar.gz >/dev/null > >real 0m2.009s >user 0m1.977s >sys 0m0.032s >--8<---------------cut here---------------end--------------->8--- > >The decompression throughput (compressed bytes read in the first >column, >uncompressed bytes written in the second column) is: > > input | output > gzip: 167 MiB/s | 52 MB/s > lzip: 56 MiB/s | 11 MB/s > >Indeed, if you run this from a computer on your LAN: > > wget -O - … | gunzip > /dev/null > >you’ll find that wget caps at 50 M/s with gunzip, whereas with lunzip >it >caps at 11 MB/s. > >From my place I get a peak download bandwidth of 30+ MB/s from >ci.guix.gnu.org, thus substitute downloads are CPU-bound (I can’t go >beyond 11 M/s due to decompression). I must say it never occurred to >me >it could be the case when we introduced lzip substitutes. > >I’d get faster substitute downloads with gzip (I would download more >but >the time-to-disk would be smaller.) Specifically, download + >decompression of ungoogled-chromium from the LAN completes in 2.4s for >gzip vs. 7.1s for lzip. On a low-end ARMv7 device, also on the LAN, I >get 32s (gzip) vs. 53s (lzip). > >Where to go from here? Several options: > > 0. Lzip decompression speed increases with compression ratio, but > we’re already using ‘--best’ on ci. The only way we could gain is > by using “multi-member archives” and then parallel decompression as > done in plzip, but that’s probably not supported in lzlib. So > we’re probably stuck here. > > 1. Since ci.guix.gnu.org still provides both gzip and lzip archives, > ‘guix substitute’ could automatically pick one or the other > depending on the CPU and bandwidth. Perhaps a simple trick would > be to check the user/wall-clock time ratio and switch to gzip for > subsequent downloads if that ratio is close to one. How well would > that work? > > 2. Use Zstd like all the cool kids since it seems to have a much > higher decompression speed: <https://facebook.github.io/zstd/>. > 630 MB/s on ungoogled-chromium on my laptop. Woow. > > 3. Allow for parallel downloads (really: parallel decompression) as > Julien did in <https://issues.guix.gnu.org/39728>. > >My preference would be #2, #1, and #3, in this order. #2 is great but >it’s quite a bit of work, whereas #1 could be deployed quickly. I’m >not >fond of #3 because it just papers over the underlying issue and could >be >counterproductive if the number of jobs is wrong. > >Thoughts? > >Ludo’. [-- Attachment #2: Type: text/html, Size: 4247 bytes --] ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2020-12-14 22:20 When substitute download + decompression is CPU-bound Ludovic Courtès 2020-12-14 22:29 ` Julien Lepiller @ 2020-12-14 22:59 ` Nicolò Balzarotti 2020-12-15 7:52 ` Pierre Neidhardt 2020-12-15 11:36 ` Ludovic Courtès 2020-12-15 10:40 ` Jonathan Brielmaier 2 siblings, 2 replies; 43+ messages in thread From: Nicolò Balzarotti @ 2020-12-14 22:59 UTC (permalink / raw) To: Ludovic Courtès, guix-devel Ludovic Courtès <ludo@gnu.org> writes: > Hi Guix! > Hi Ludo > Quick decompression bench: I guess this benchmark follows the distri talk, doesn't it? :) File size with zstd vs zstd -9 vs current lzip: - 71M uc.nar.lz - 87M uc.nar.zst-9 - 97M uc.nar.zst-default > Where to go from here? Several options: > 1. Since ci.guix.gnu.org still provides both gzip and lzip archives, > ‘guix substitute’ could automatically pick one or the other > depending on the CPU and bandwidth. Perhaps a simple trick would > be to check the user/wall-clock time ratio and switch to gzip for > subsequent downloads if that ratio is close to one. How well would > that work? I'm not sure using heuristics (i.e., guessing what should work better, like in 1.) is the way to go, as temporary slowdowns to the network/cpu will during the first download would affect the decision. > 2. Use Zstd like all the cool kids since it seems to have a much > higher decompression speed: <https://facebook.github.io/zstd/>. > 630 MB/s on ungoogled-chromium on my laptop. Woow. I know this means more work to do, but it seems to be the best alternative. However, if we go that way, will we keep lzip substitutes? The 20% difference in size between lzip/zstd would mean a lot with slow (mobile) network connections. Nicolò ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2020-12-14 22:59 ` Nicolò Balzarotti @ 2020-12-15 7:52 ` Pierre Neidhardt 2020-12-15 9:45 ` Nicolò Balzarotti 2020-12-15 11:42 ` Ludovic Courtès 2020-12-15 11:36 ` Ludovic Courtès 1 sibling, 2 replies; 43+ messages in thread From: Pierre Neidhardt @ 2020-12-15 7:52 UTC (permalink / raw) To: Nicolò Balzarotti, Ludovic Courtès, guix-devel [-- Attachment #1: Type: text/plain, Size: 680 bytes --] Another option is plzip (parallel Lzip, an official part of Lzip). > decompression of ungoogled-chromium from the LAN completes in 2.4s for > gzip vs. 7.1s for lzip. On a low-end ARMv7 device, also on the LAN, I > get 32s (gzip) vs. 53s (lzip). With four cores, plzip would beat gzip in the first case. With only 2 cores, plzip would beat gzip in the second case. What's left to do to implement plzip support? That's the good news: almost nothing! - On the Lzip binding side, we need to add support for multi pages. It's a bit of work but not that much. - On the Guix side, there is nothing to do. Cheers! -- Pierre Neidhardt https://ambrevar.xyz/ [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 511 bytes --] ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2020-12-15 7:52 ` Pierre Neidhardt @ 2020-12-15 9:45 ` Nicolò Balzarotti 2020-12-15 9:54 ` Pierre Neidhardt 2020-12-15 11:42 ` Ludovic Courtès 1 sibling, 1 reply; 43+ messages in thread From: Nicolò Balzarotti @ 2020-12-15 9:45 UTC (permalink / raw) To: Pierre Neidhardt, Ludovic Courtès, guix-devel Pierre Neidhardt <mail@ambrevar.xyz> writes: > Another option is plzip (parallel Lzip, an official part of Lzip). Wouldn't that mean that this will become a problem when we'll have parallel downloads (and sometimes parallel decompression will happen)? ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2020-12-15 9:45 ` Nicolò Balzarotti @ 2020-12-15 9:54 ` Pierre Neidhardt 2020-12-15 10:03 ` Nicolò Balzarotti 0 siblings, 1 reply; 43+ messages in thread From: Pierre Neidhardt @ 2020-12-15 9:54 UTC (permalink / raw) To: Nicolò Balzarotti, Ludovic Courtès, guix-devel [-- Attachment #1: Type: text/plain, Size: 615 bytes --] Nicolò Balzarotti <anothersms@gmail.com> writes: > Pierre Neidhardt <mail@ambrevar.xyz> writes: > >> Another option is plzip (parallel Lzip, an official part of Lzip). > > Wouldn't that mean that this will become a problem when we'll have > parallel downloads (and sometimes parallel decompression will happen)? What do you mean? Parallel decompression is unrelated to downloads as far as I understand. Once the archive (or just archive chunks?) is available, plzip can decompress multiple segments at the same time if enough cores are available. -- Pierre Neidhardt https://ambrevar.xyz/ [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 511 bytes --] ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2020-12-15 9:54 ` Pierre Neidhardt @ 2020-12-15 10:03 ` Nicolò Balzarotti 2020-12-15 10:13 ` Pierre Neidhardt 0 siblings, 1 reply; 43+ messages in thread From: Nicolò Balzarotti @ 2020-12-15 10:03 UTC (permalink / raw) To: Pierre Neidhardt, Ludovic Courtès, guix-devel Pierre Neidhardt <mail@ambrevar.xyz> writes: > > What do you mean? > If you download multiple files at a time, you might end up decompressing them simultaneously. Plzip won't help then on a dual core machine, where you might end up being cpu bound again then. Is this right? If it is, reducing the overall cpu usage seems to be a better approach in the long term. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2020-12-15 10:03 ` Nicolò Balzarotti @ 2020-12-15 10:13 ` Pierre Neidhardt 2020-12-15 10:14 ` Pierre Neidhardt 0 siblings, 1 reply; 43+ messages in thread From: Pierre Neidhardt @ 2020-12-15 10:13 UTC (permalink / raw) To: Nicolò Balzarotti, Ludovic Courtès, guix-devel [-- Attachment #1: Type: text/plain, Size: 728 bytes --] Nicolò Balzarotti <anothersms@gmail.com> writes: > If you download multiple files at a time, you might end up decompressing > them simultaneously. Plzip won't help then on a dual core machine, > where you might end up being cpu bound again then. Is this right? > > If it is, reducing the overall cpu usage seems to be a better approach > in the long term. An answer to this may be in pipelining the process. The parallel downloads would feed the archives to the pipeline and the parallel decompressor would pop the archives out of the pipeline one by one. If I'm not mistaken, this should yield optimal results regardless of the network or CPU performance. -- Pierre Neidhardt https://ambrevar.xyz/ [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 511 bytes --] ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2020-12-15 10:13 ` Pierre Neidhardt @ 2020-12-15 10:14 ` Pierre Neidhardt 0 siblings, 0 replies; 43+ messages in thread From: Pierre Neidhardt @ 2020-12-15 10:14 UTC (permalink / raw) To: Nicolò Balzarotti, Ludovic Courtès, guix-devel [-- Attachment #1: Type: text/plain, Size: 130 bytes --] Here the "pipeline" could be a CSP channel. Not sure what the term is in Guile. -- Pierre Neidhardt https://ambrevar.xyz/ [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 511 bytes --] ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2020-12-15 7:52 ` Pierre Neidhardt 2020-12-15 9:45 ` Nicolò Balzarotti @ 2020-12-15 11:42 ` Ludovic Courtès 2020-12-15 12:31 ` Pierre Neidhardt 1 sibling, 1 reply; 43+ messages in thread From: Ludovic Courtès @ 2020-12-15 11:42 UTC (permalink / raw) To: Pierre Neidhardt; +Cc: guix-devel, Nicolò Balzarotti Hi, Pierre Neidhardt <mail@ambrevar.xyz> skribis: > Another option is plzip (parallel Lzip, an official part of Lzip). > >> decompression of ungoogled-chromium from the LAN completes in 2.4s for >> gzip vs. 7.1s for lzip. On a low-end ARMv7 device, also on the LAN, I >> get 32s (gzip) vs. 53s (lzip). > > With four cores, plzip would beat gzip in the first case. > With only 2 cores, plzip would beat gzip in the second case. > > What's left to do to implement plzip support? That's the good news: > almost nothing! > > - On the Lzip binding side, we need to add support for multi pages. > It's a bit of work but not that much. > - On the Guix side, there is nothing to do. Well, ‘guix publish’ would first need to create multi-member archives, right? Also, lzlib (which is what we use) does not implement parallel decompression, AIUI. Even if it did, would we be able to take advantage of it? Currently ‘restore-file’ expects to read an archive stream sequentially. Even if I’m wrong :-), decompression speed would at best be doubled on multi-core machines (wouldn’t help much on low-end ARM devices), and that’s very little compared to the decompression speed achieved by zstd. Ludo’. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2020-12-15 11:42 ` Ludovic Courtès @ 2020-12-15 12:31 ` Pierre Neidhardt 2020-12-18 14:59 ` Ludovic Courtès 0 siblings, 1 reply; 43+ messages in thread From: Pierre Neidhardt @ 2020-12-15 12:31 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guix-devel, Nicolò Balzarotti [-- Attachment #1: Type: text/plain, Size: 1716 bytes --] Hi Ludo, Ludovic Courtès <ludo@gnu.org> writes: > Well, ‘guix publish’ would first need to create multi-member archives, > right? Correct, but it's trivial once the bindings have been implemented. > Also, lzlib (which is what we use) does not implement parallel > decompression, AIUI. Yes it does, multi-member archives is a non-optional part of the Lzip specs, and lzlib implemetns all the specs. > Even if it did, would we be able to take advantage of it? Currently > ‘restore-file’ expects to read an archive stream sequentially. Yes it works, I just tried this: --8<---------------cut here---------------start------------->8--- cat big-file.lz | plzip -d -o big-file - --8<---------------cut here---------------end--------------->8--- Decompression happens in parallel. > Even if I’m wrong :-), decompression speed would at best be doubled on > multi-core machines (wouldn’t help much on low-end ARM devices), and > that’s very little compared to the decompression speed achieved by zstd. Why doubled? If the archive has more than CORE-NUMBER segments, then the decompression duration can be divided by CORE-NUMBER. All that said, I think we should have both: - Parallel lzip support is the easiest to add at this point. It's the best option for people with low bandwidth. This can benefit most of the planet I suppose. - zstd is best for users with high bandwidth (or with slow hardware). We need to write the necessary bindings though, so it will take a bit more time. Then the users can choose which compression they prefer, mostly depending on their hardware and bandwidth. -- Pierre Neidhardt https://ambrevar.xyz/ [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 511 bytes --] ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2020-12-15 12:31 ` Pierre Neidhardt @ 2020-12-18 14:59 ` Ludovic Courtès 2020-12-18 15:33 ` Pierre Neidhardt 0 siblings, 1 reply; 43+ messages in thread From: Ludovic Courtès @ 2020-12-18 14:59 UTC (permalink / raw) To: Pierre Neidhardt; +Cc: guix-devel, Nicolò Balzarotti Hi Pierre, Pierre Neidhardt <mail@ambrevar.xyz> skribis: > Ludovic Courtès <ludo@gnu.org> writes: > >> Well, ‘guix publish’ would first need to create multi-member archives, >> right? > > Correct, but it's trivial once the bindings have been implemented. OK. >> Also, lzlib (which is what we use) does not implement parallel >> decompression, AIUI. > > Yes it does, multi-member archives is a non-optional part of the Lzip > specs, and lzlib implemetns all the specs. Nice. >> Even if it did, would we be able to take advantage of it? Currently >> ‘restore-file’ expects to read an archive stream sequentially. > > Yes it works, I just tried this: > > cat big-file.lz | plzip -d -o big-file - > > Decompression happens in parallel. > >> Even if I’m wrong :-), decompression speed would at best be doubled on >> multi-core machines (wouldn’t help much on low-end ARM devices), and >> that’s very little compared to the decompression speed achieved by zstd. > > Why doubled? If the archive has more than CORE-NUMBER segments, then > the decompression duration can be divided by CORE-NUMBER. My laptop has 4 cores, so at best I’d get a 4x speedup, compared to the 10x speedup with zstd that also comes with much lower resource usage, etc. > All that said, I think we should have both: > > - Parallel lzip support is the easiest to add at this point. > It's the best option for people with low bandwidth. This can benefit > most of the planet I suppose. > > - zstd is best for users with high bandwidth (or with slow hardware). > We need to write the necessary bindings though, so it will take a bit > more time. > > Then the users can choose which compression they prefer, mostly > depending on their hardware and bandwidth. Would you like to give parallel lzip a try? Thanks! Ludo’. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2020-12-18 14:59 ` Ludovic Courtès @ 2020-12-18 15:33 ` Pierre Neidhardt 0 siblings, 0 replies; 43+ messages in thread From: Pierre Neidhardt @ 2020-12-18 15:33 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guix-devel, Nicolò Balzarotti [-- Attachment #1: Type: text/plain, Size: 704 bytes --] Ludovic Courtès <ludo@gnu.org> writes: > My laptop has 4 cores, so at best I’d get a 4x speedup, compared to the > 10x speedup with zstd that also comes with much lower resource usage, > etc. Of course, it's a trade off between high compression and high speed :) Since there is no universal best option, I think it's best to support both. > Would you like to give parallel lzip a try? It shouldn't be too hard for me considering I already have experience with Lzip, but I can only reasonably do this after FOSDEM, so in 1.5 month from now. If I forget, please ping me ;) If there is any taker before that, please go ahead! :) -- Pierre Neidhardt https://ambrevar.xyz/ [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 511 bytes --] ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2020-12-14 22:59 ` Nicolò Balzarotti 2020-12-15 7:52 ` Pierre Neidhardt @ 2020-12-15 11:36 ` Ludovic Courtès 2020-12-15 11:45 ` Nicolò Balzarotti 1 sibling, 1 reply; 43+ messages in thread From: Ludovic Courtès @ 2020-12-15 11:36 UTC (permalink / raw) To: Nicolò Balzarotti; +Cc: guix-devel Hi, Nicolò Balzarotti <anothersms@gmail.com> skribis: > I guess this benchmark follows the distri talk, doesn't it? :) Yes, that and my own quest for optimization opportunities. :-) > File size with zstd vs zstd -9 vs current lzip: > - 71M uc.nar.lz > - 87M uc.nar.zst-9 > - 97M uc.nar.zst-default > >> Where to go from here? Several options: > >> 1. Since ci.guix.gnu.org still provides both gzip and lzip archives, >> ‘guix substitute’ could automatically pick one or the other >> depending on the CPU and bandwidth. Perhaps a simple trick would >> be to check the user/wall-clock time ratio and switch to gzip for >> subsequent downloads if that ratio is close to one. How well would >> that work? > > I'm not sure using heuristics (i.e., guessing what should work better, > like in 1.) is the way to go, as temporary slowdowns to the network/cpu > will during the first download would affect the decision. I suppose we could time each substitute download and adjust the choice continually. It might be better to provide a command-line flag to choose between optimizing for bandwidth usage (users with limited Internet access may prefer that) or for speed. >> 2. Use Zstd like all the cool kids since it seems to have a much >> higher decompression speed: <https://facebook.github.io/zstd/>. >> 630 MB/s on ungoogled-chromium on my laptop. Woow. > > I know this means more work to do, but it seems to be the best > alternative. However, if we go that way, will we keep lzip substitutes? > The 20% difference in size between lzip/zstd would mean a lot with slow > (mobile) network connections. A lot in what sense? In terms of bandwidth usage, right? In terms of speed, zstd would probably reduce the time-to-disk as soon as you have ~15 MB/s peak bandwidth or more. Anyway, we’re not there yet, but I suppose if we get zstd support, we could configure berlin to keep lzip and zstd (rather than lzip and gzip as is currently the case). Ludo’. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2020-12-15 11:36 ` Ludovic Courtès @ 2020-12-15 11:45 ` Nicolò Balzarotti 0 siblings, 0 replies; 43+ messages in thread From: Nicolò Balzarotti @ 2020-12-15 11:45 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guix-devel Ludovic Courtès <ludo@gnu.org> writes: > A lot in what sense? In terms of bandwidth usage, right? Yep, I think most of mobile data plans are still limited. Even if here in Italy is easy to get 50Gb+/monthly, I think it's not the same worldwide. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2020-12-14 22:20 When substitute download + decompression is CPU-bound Ludovic Courtès 2020-12-14 22:29 ` Julien Lepiller 2020-12-14 22:59 ` Nicolò Balzarotti @ 2020-12-15 10:40 ` Jonathan Brielmaier 2020-12-15 19:43 ` Joshua Branson 2 siblings, 1 reply; 43+ messages in thread From: Jonathan Brielmaier @ 2020-12-15 10:40 UTC (permalink / raw) To: guix-devel Super interesting findings! On 14.12.20 23:20, Ludovic Courtès wrote: > 2. Use Zstd like all the cool kids since it seems to have a much > higher decompression speed: <https://facebook.github.io/zstd/>. > 630 MB/s on ungoogled-chromium on my laptop. Woow. Not only decompression speed is fast, compression is as well: size file time for compression (lower is better) 335M uc.nar 104M uc.nar.gz 8 71M uc.nar.lz.level9 120 74M uc.nar.lz.level6 80 82M uc.nar.lz.level3 30 89M uc.nar.lz .level1 16 97M uc.nar.zst 1 So I am bought by zstd, as user and as substitution server care taker :) For mobile users and users without internet flatrates the increased nar size is a problem. Although I think the problem here is not bewtween gzip, lzip and zstd. It's the fact that we completely download the new package even if's just some 100 lines of diffoscope diff[0]. And most of them is due to the change /gnu/store name... [0] diffoscope --max-diff-block-lines 0 /gnu/store/zvcn2r352wxnmq7jayz5myg23gh9s17q-icedove-78.5.1 /gnu/store/dzjym6y7b9z4apgvvydj9lf0kbaa8qbv-icedove-78.5.1 lines: 783 size: 64k ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2020-12-15 10:40 ` Jonathan Brielmaier @ 2020-12-15 19:43 ` Joshua Branson 2021-01-07 10:45 ` Guillaume Le Vaillant 0 siblings, 1 reply; 43+ messages in thread From: Joshua Branson @ 2020-12-15 19:43 UTC (permalink / raw) To: Jonathan Brielmaier; +Cc: guix-devel Looking on the Zstandard website (https://facebook.github.io/zstd/), it mentions google's snappy compression library (https://github.com/google/snappy). Snappy has some fairly good benchmarks too: Compressor Ratio Compression Decompress. zstd 2.884 500 MB/s 1660 MB/s snappy 2.073 560 MB/s 1790 MB/s Would snappy be easier to use than Zstandard? -- Joshua Branson Sent from Emacs and Gnus https://gnucode.me https://video.hardlimit.com/accounts/joshua_branson/video-channels https://propernaming.org "You can have whatever you want, as long as you help enough other people get what they want." - Zig Ziglar ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2020-12-15 19:43 ` Joshua Branson @ 2021-01-07 10:45 ` Guillaume Le Vaillant 2021-01-07 11:00 ` Pierre Neidhardt 2021-01-14 21:51 ` Ludovic Courtès 0 siblings, 2 replies; 43+ messages in thread From: Guillaume Le Vaillant @ 2021-01-07 10:45 UTC (permalink / raw) To: Joshua Branson; +Cc: guix-devel [-- Attachment #1.1: Type: text/plain, Size: 353 bytes --] I compared gzip, lzip and zstd when compressing a 580 MB pack (therefore containing "subsitutes" for several packages) with different compression levels. Maybe the results can be of some use to someone. Note that the plots only show the results using only 1 thread and standard compression levels, and that the speed axis is using logarithmic scale. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1.2: compression-benchmark.org --] [-- Type: text/x-org, Size: 9159 bytes --] Machine used for the tests: - CPU: Intel i7-3630QM - RAM: 16 MiB Programs: - gzip 1.10 - pigz 2.4 - lzip 1.21 - plzip 1.8 - zstd 1.4.4 - pzstd 1.4.4 Uncompressed file: - name: monero-0.17.1.5-pack.tar - size: 582707200 bytes #+PLOT: script:"compression-benchmark.plot" | Comp. command | Comp. time | Comp. size | Comp. speed | Comp. ratio | Decomp. time | Decomp. speed | |----------------+------------+------------+-------------+-------------+--------------+---------------| | gzip -1 | 7.999 | 166904534 | 72847506 | 3.491 | 3.292 | 50700041 | | gzip -2 | 8.469 | 161859128 | 68804723 | 3.600 | 3.214 | 50360650 | | gzip -3 | 10.239 | 157839772 | 56910558 | 3.692 | 3.144 | 50203490 | | gzip -4 | 11.035 | 151039457 | 52805365 | 3.858 | 3.104 | 48659619 | | gzip -5 | 13.767 | 146693142 | 42326375 | 3.972 | 3.143 | 46672969 | | gzip -6 | 19.707 | 144364588 | 29568539 | 4.036 | 3.001 | 48105494 | | gzip -7 | 24.014 | 143727357 | 24265312 | 4.054 | 2.993 | 48021168 | | gzip -8 | 43.219 | 143062985 | 13482663 | 4.073 | 2.969 | 48185579 | | gzip -9 | 70.930 | 142803637 | 8215243 | 4.080 | 2.964 | 48179365 | | pigz -1 -p 4 | 2.247 | 165745308 | 259326747 | 3.516 | 1.919 | 86370666 | | pigz -2 -p 4 | 2.394 | 160661935 | 243403175 | 3.627 | 1.862 | 86284605 | | pigz -3 -p 4 | 2.776 | 156696382 | 209908934 | 3.719 | 1.817 | 86239065 | | pigz -4 -p 4 | 3.045 | 150539955 | 191365255 | 3.871 | 1.787 | 84241721 | | pigz -5 -p 4 | 3.855 | 146289903 | 151156213 | 3.983 | 1.732 | 84462992 | | pigz -6 -p 4 | 5.378 | 143967093 | 108350167 | 4.048 | 1.721 | 83653163 | | pigz -7 -p 4 | 6.579 | 143350506 | 88570786 | 4.065 | 1.702 | 84224739 | | pigz -8 -p 4 | 11.76 | 142738270 | 49549932 | 4.082 | 1.720 | 82987366 | | pigz -9 -p 4 | 19.878 | 142479078 | 29314176 | 4.090 | 1.691 | 84257290 | | lzip -0 | 16.686 | 130302649 | 34921923 | 4.472 | 9.981 | 13055070 | | lzip -1 | 42.011 | 118070414 | 13870348 | 4.935 | 8.669 | 13619842 | | lzip -2 | 51.395 | 112769303 | 11337819 | 5.167 | 8.368 | 13476255 | | lzip -3 | 69.344 | 106182860 | 8403138 | 5.488 | 8.162 | 13009417 | | lzip -4 | 89.781 | 100072461 | 6490318 | 5.823 | 7.837 | 12769231 | | lzip -5 | 119.626 | 95033235 | 4871075 | 6.132 | 7.586 | 12527450 | | lzip -6 | 155.740 | 83063613 | 3741538 | 7.015 | 6.856 | 12115463 | | lzip -7 | 197.485 | 78596381 | 2950640 | 7.414 | 6.586 | 11933857 | | lzip -8 | 238.076 | 72885403 | 2447568 | 7.995 | 6.227 | 11704738 | | lzip -9 | 306.368 | 72279340 | 1901985 | 8.062 | 6.203 | 11652320 | | plzip -0 -n 4 | 4.821 | 131211238 | 120868533 | 4.441 | 2.829 | 46380784 | | plzip -1 -n 4 | 13.453 | 120565830 | 43314294 | 4.833 | 2.604 | 46300242 | | plzip -2 -n 4 | 15.695 | 114874773 | 37126932 | 5.073 | 2.398 | 47904409 | | plzip -3 -n 4 | 20.563 | 108896468 | 28337655 | 5.351 | 2.486 | 43803889 | | plzip -4 -n 4 | 26.871 | 102285879 | 21685356 | 5.697 | 2.375 | 43067739 | | plzip -5 -n 4 | 35.220 | 97402840 | 16544781 | 5.982 | 2.448 | 39788742 | | plzip -6 -n 4 | 45.812 | 89260273 | 12719532 | 6.528 | 2.145 | 41613181 | | plzip -7 -n 4 | 62.723 | 82944080 | 9290168 | 7.025 | 2.080 | 39876962 | | plzip -8 -n 4 | 71.928 | 78477272 | 8101257 | 7.425 | 2.120 | 37017581 | | plzip -9 -n 4 | 103.744 | 75648923 | 5616780 | 7.703 | 2.578 | 29344035 | | zstd -1 | 2.057 | 145784609 | 283280117 | 3.997 | 0.639 | 228144928 | | zstd -2 | 2.316 | 136049621 | 251600691 | 4.283 | 0.657 | 207077049 | | zstd -3 | 2.733 | 127702753 | 213211562 | 4.563 | 0.650 | 196465774 | | zstd -4 | 3.269 | 126224007 | 178252432 | 4.616 | 0.658 | 191829798 | | zstd -5 | 5.136 | 122024478 | 113455452 | 4.775 | 0.680 | 179447762 | | zstd -6 | 6.394 | 120035201 | 91133438 | 4.854 | 0.652 | 184103069 | | zstd -7 | 8.510 | 116048780 | 68473231 | 5.021 | 0.612 | 189622190 | | zstd -8 | 9.875 | 114821611 | 59008324 | 5.075 | 0.593 | 193628349 | | zstd -9 | 12.478 | 113868149 | 46698766 | 5.117 | 0.588 | 193653315 | | zstd -10 | 14.982 | 111113753 | 38893819 | 5.244 | 0.578 | 192238327 | | zstd -11 | 16.391 | 110674252 | 35550436 | 5.265 | 0.583 | 189835767 | | zstd -12 | 21.008 | 110031164 | 27737395 | 5.296 | 0.570 | 193037130 | | zstd -13 | 51.259 | 109262475 | 11367900 | 5.333 | 0.561 | 194763770 | | zstd -14 | 58.897 | 108632734 | 9893665 | 5.364 | 0.562 | 193296680 | | zstd -15 | 82.514 | 107956132 | 7061919 | 5.398 | 0.557 | 193817113 | | zstd -16 | 78.935 | 105533404 | 7382114 | 5.522 | 0.576 | 183217715 | | zstd -17 | 89.832 | 94165409 | 6486633 | 6.188 | 0.565 | 166664441 | | zstd -18 | 115.663 | 91124039 | 5037974 | 6.395 | 0.614 | 148410487 | | zstd -19 | 157.008 | 90229137 | 3711322 | 6.458 | 0.614 | 146952992 | | zstd -20 | 162.499 | 80742922 | 3585913 | 7.217 | 0.605 | 133459375 | | zstd -21 | 207.122 | 79619348 | 2813353 | 7.319 | 0.611 | 130309899 | | zstd -22 | 277.177 | 78652901 | 2102293 | 7.409 | 0.634 | 124058203 | | pzstd -1 -p 4 | 0.621 | 146665510 | 938336876 | 3.973 | 0.196 | 748293418 | | pzstd -2 -p 4 | 0.720 | 137416958 | 809315556 | 4.240 | 0.227 | 605361048 | | pzstd -3 -p 4 | 1.180 | 128748806 | 493819661 | 4.526 | 0.231 | 557354139 | | pzstd -4 -p 4 | 1.786 | 127373154 | 326263830 | 4.575 | 0.240 | 530721475 | | pzstd -5 -p 4 | 2.635 | 123216422 | 221141252 | 4.729 | 0.240 | 513401758 | | pzstd -6 -p 4 | 3.774 | 121257316 | 154400424 | 4.806 | 0.251 | 483096876 | | pzstd -7 -p 4 | 3.988 | 117361187 | 146115145 | 4.965 | 0.263 | 446240255 | | pzstd -8 -p 4 | 4.540 | 116172098 | 128349604 | 5.016 | 0.240 | 484050408 | | pzstd -9 -p 4 | 5.083 | 115237287 | 114638442 | 5.057 | 0.268 | 429989877 | | pzstd -10 -p 4 | 5.630 | 112359994 | 103500391 | 5.186 | 0.226 | 497168115 | | pzstd -11 -p 4 | 5.991 | 111969711 | 97263762 | 5.204 | 0.246 | 455161427 | | pzstd -12 -p 4 | 8.001 | 111326376 | 72829296 | 5.234 | 0.227 | 490424564 | | pzstd -13 -p 4 | 16.035 | 110525395 | 36339707 | 5.272 | 0.259 | 426738977 | | pzstd -14 -p 4 | 18.145 | 109957500 | 32113927 | 5.299 | 0.253 | 434614625 | | pzstd -15 -p 4 | 24.791 | 109358520 | 23504788 | 5.328 | 0.224 | 488207679 | | pzstd -16 -p 4 | 23.940 | 106888588 | 24340317 | 5.452 | 0.234 | 456788838 | | pzstd -17 -p 4 | 29.099 | 97393935 | 20024991 | 5.983 | 0.266 | 366142613 | | pzstd -18 -p 4 | 37.124 | 94273955 | 15696240 | 6.181 | 0.284 | 331950546 | | pzstd -19 -p 4 | 48.798 | 93531545 | 11941211 | 6.230 | 0.262 | 356990630 | | pzstd -20 -p 4 | 54.860 | 82067608 | 10621713 | 7.100 | 0.302 | 271747046 | | pzstd -21 -p 4 | 64.179 | 79735488 | 9079406 | 7.308 | 0.389 | 204975548 | | pzstd -22 -p 4 | 256.242 | 78688788 | 2274050 | 7.405 | 0.585 | 134510749 | #+TBLFM: $4='(format "%d" (round (/ 582707200.0 $2)));N :: $5='(format "%.3f" (/ 582707200.0 $3));N :: $7='(format "%d" (round (/ $3 $6)));N [-- Attachment #1.3: compression-benchmark.plot --] [-- Type: text/plain, Size: 1631 bytes --] set terminal png size 1920, 1080 set style data linespoints set logscale y set xlabel "Compression ratio" set ylabel "Compression speed (MB/s)" set output "compression.png" plot '$datafile' every ::0::8 using 5:($4 / 1000000) linecolor "dark-violet" title "gzip", \ '$datafile' every ::0::8 using 5:($4 / 1000000):(substr(stringcolumn(1), 7, 8)) with labels textcolor "dark-violet" offset -1, -1 notitle, \ '$datafile' every ::18::27 using 5:($4 / 1000000) linecolor "navy" title "lzip", \ '$datafile' every ::18::27 using 5:($4 / 1000000):(substr(stringcolumn(1), 7, 8)) with labels textcolor "navy" offset 0, -1 notitle, \ '$datafile' every ::38::56 using 5:($4 / 1000000) linecolor "olive" title "zstd", \ '$datafile' every ::38::56 using 5:($4 / 1000000):(substr(stringcolumn(1), 7, 9)) with labels textcolor "olive" offset 1, 1 notitle set ylabel "Decompression speed (MB/s)" set output "decompression.png" plot '$datafile' every ::0::8 using 5:($7 / 1000000) linecolor "dark-violet" title "gzip", \ '$datafile' every ::0::8 using 5:($7 / 1000000):(substr(stringcolumn(1), 7, 8)) with labels textcolor "dark-violet" offset 0, -1 notitle, \ '$datafile' every ::18::27 using 5:($7 / 1000000) linecolor "navy" title "lzip", \ '$datafile' every ::18::27 using 5:($7 / 1000000):(substr(stringcolumn(1), 7, 8)) with labels textcolor "navy" offset 0, -1 notitle, \ '$datafile' every ::38::56 using 5:($7 / 1000000) linecolor "olive" title "zstd", \ '$datafile' every ::38::56 using 5:($7 / 1000000):(substr(stringcolumn(1), 7, 8)) with labels textcolor "olive" offset 0, -1 notitle [-- Attachment #1.4: compression.png --] [-- Type: image/png, Size: 16056 bytes --] [-- Attachment #1.5: decompression.png --] [-- Type: image/png, Size: 12804 bytes --] [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 247 bytes --] ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2021-01-07 10:45 ` Guillaume Le Vaillant @ 2021-01-07 11:00 ` Pierre Neidhardt 2021-01-07 11:33 ` Guillaume Le Vaillant 2021-01-14 21:51 ` Ludovic Courtès 1 sibling, 1 reply; 43+ messages in thread From: Pierre Neidhardt @ 2021-01-07 11:00 UTC (permalink / raw) To: Guillaume Le Vaillant, Joshua Branson; +Cc: guix-devel [-- Attachment #1: Type: text/plain, Size: 392 bytes --] Wow, impressive! :) Guillaume Le Vaillant <glv@posteo.net> writes: > Note that the plots only show the results using only 1 thread and Doesn't 1 thread defeat the purpose of parallel compression / decompression? > Machine used for the tests: > - CPU: Intel i7-3630QM > - RAM: 16 MiB I suppose you meant 16 GiB ;) Cheers! -- Pierre Neidhardt https://ambrevar.xyz/ [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 511 bytes --] ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2021-01-07 11:00 ` Pierre Neidhardt @ 2021-01-07 11:33 ` Guillaume Le Vaillant 0 siblings, 0 replies; 43+ messages in thread From: Guillaume Le Vaillant @ 2021-01-07 11:33 UTC (permalink / raw) To: Pierre Neidhardt; +Cc: guix-devel [-- Attachment #1: Type: text/plain, Size: 643 bytes --] Pierre Neidhardt <mail@ambrevar.xyz> skribis: > Wow, impressive! :) > > Guillaume Le Vaillant <glv@posteo.net> writes: > >> Note that the plots only show the results using only 1 thread and > > Doesn't 1 thread defeat the purpose of parallel compression / decompression? > It was just to get a better idea of the relative compression and decompression speeds of the algorithms. When using n threads, if the file is big enough, the speeds are almost multiplied by n and the compression ratio is a little lower. >> Machine used for the tests: >> - CPU: Intel i7-3630QM >> - RAM: 16 MiB > > I suppose you meant 16 GiB ;) Yes, of course :) [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 247 bytes --] ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2021-01-07 10:45 ` Guillaume Le Vaillant 2021-01-07 11:00 ` Pierre Neidhardt @ 2021-01-14 21:51 ` Ludovic Courtès 2021-01-14 22:08 ` Nicolò Balzarotti 2021-01-15 8:10 ` When substitute download + decompression is CPU-bound Pierre Neidhardt 1 sibling, 2 replies; 43+ messages in thread From: Ludovic Courtès @ 2021-01-14 21:51 UTC (permalink / raw) To: Guillaume Le Vaillant; +Cc: guix-devel Hi Guillaume, Guillaume Le Vaillant <glv@posteo.net> skribis: > I compared gzip, lzip and zstd when compressing a 580 MB pack (therefore > containing "subsitutes" for several packages) with different compression > levels. Maybe the results can be of some use to someone. It’s insightful, thanks a lot! One takeaway for me is that zstd decompression remains an order of magnitude faster than the others, regardless of the compression level. Another one is that at level 10 and higher zstd achieves compression ratios that are more in the ballpark of lzip. If we are to change the compression methods used at ci.guix.gnu.org, we could use zstd >= 10. We could also drop gzip, but there are probably pre-1.1 daemons out there that understand nothing but gzip¹, so perhaps that’ll have to wait. Now, compressing substitutes three times may be somewhat unreasonable. Thoughts? Ludo’. ¹ https://guix.gnu.org/en/blog/2020/gnu-guix-1.1.0-released/ ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2021-01-14 21:51 ` Ludovic Courtès @ 2021-01-14 22:08 ` Nicolò Balzarotti 2021-01-28 17:53 ` Are gzip-compressed substitutes still used? Ludovic Courtès 2021-01-15 8:10 ` When substitute download + decompression is CPU-bound Pierre Neidhardt 1 sibling, 1 reply; 43+ messages in thread From: Nicolò Balzarotti @ 2021-01-14 22:08 UTC (permalink / raw) To: Ludovic Courtès, Guillaume Le Vaillant; +Cc: guix-devel Hi Ludo, Ludovic Courtès <ludo@gnu.org> writes: > We could also drop gzip, but there are probably pre-1.1 daemons out > there that understand nothing but gzip¹, so perhaps that’ll have to > wait. Now, compressing substitutes three times may be somewhat > unreasonable. > > Thoughts? > Is there a request log where we can check whether this is true? ^ permalink raw reply [flat|nested] 43+ messages in thread
* Are gzip-compressed substitutes still used? 2021-01-14 22:08 ` Nicolò Balzarotti @ 2021-01-28 17:53 ` Ludovic Courtès 2021-03-17 17:12 ` Ludovic Courtès 0 siblings, 1 reply; 43+ messages in thread From: Ludovic Courtès @ 2021-01-28 17:53 UTC (permalink / raw) To: Nicolò Balzarotti; +Cc: guix-devel Hi Nicolò, Nicolò Balzarotti <anothersms@gmail.com> skribis: > Ludovic Courtès <ludo@gnu.org> writes: > >> We could also drop gzip, but there are probably pre-1.1 daemons out >> there that understand nothing but gzip¹, so perhaps that’ll have to >> wait. Now, compressing substitutes three times may be somewhat >> unreasonable. >> >> Thoughts? >> > Is there a request log where we can check whether this is true? I finally got around to checking this. I picked a relatively popular substitute for which the lzip-compressed variant is smaller than the gzip-compressed variant, and thus modern ‘guix substitute’ chooses lzip over gzip: --8<---------------cut here---------------start------------->8--- $ wget -q -O - https://ci.guix.gnu.org/7rpj4dmn9g64zqp8vkc0byx93glix2pm.narinfo | head -7 StorePath: /gnu/store/7rpj4dmn9g64zqp8vkc0byx93glix2pm-gtk+-3.24.23 URL: nar/gzip/7rpj4dmn9g64zqp8vkc0byx93glix2pm-gtk%2B-3.24.23 Compression: gzip FileSize: 13982949 URL: nar/lzip/7rpj4dmn9g64zqp8vkc0byx93glix2pm-gtk%2B-3.24.23 Compression: lzip FileSize: 7223862 --8<---------------cut here---------------end--------------->8--- On berlin, I looked at the HTTPS nginx logs and did this: --8<---------------cut here---------------start------------->8--- ludo@berlin /var/log/nginx$ tail -10000000 < https.access.log > /tmp/sample.log ludo@berlin /var/log/nginx$ date Thu 28 Jan 2021 06:18:01 PM CET ludo@berlin /var/log/nginx$ grep /7rpj4dmn9g64zqp8vkc0byx93glix2pm-gtk < /tmp/sample.log |wc -l 1304 ludo@berlin /var/log/nginx$ grep /gzip/7rpj4dmn9g64zqp8vkc0byx93glix2pm-gtk < /tmp/sample.log |wc -l 17 ludo@berlin /var/log/nginx$ grep /lzip/7rpj4dmn9g64zqp8vkc0byx93glix2pm-gtk < /tmp/sample.log |wc -l 1287 --8<---------------cut here---------------end--------------->8--- The 10M-request sample covers requests from Jan. 10th to now. Over that period, 99% of the GTK+ downloads were made as lzip. We see similar results with less popular packages and with core packages: --8<---------------cut here---------------start------------->8--- ludo@berlin /var/log/nginx$ grep /01xi3sig314wgwa1j9sxk37vl816mj74-r-minimal < /tmp/sample.log | wc -l 85 ludo@berlin /var/log/nginx$ grep /gzip/01xi3sig314wgwa1j9sxk37vl816mj74-r-minimal < /tmp/sample.log | wc -l 1 ludo@berlin /var/log/nginx$ grep /lzip/01xi3sig314wgwa1j9sxk37vl816mj74-r-minimal < /tmp/sample.log | wc -l 84 ludo@berlin /var/log/nginx$ grep /0m0vd873jp61lcm4xa3ljdgx381qa782-guile-3.0.2 < /tmp/sample.log |wc -l 1601 ludo@berlin /var/log/nginx$ grep /gzip/0m0vd873jp61lcm4xa3ljdgx381qa782-guile-3.0.2 < /tmp/sample.log |wc -l 8 ludo@berlin /var/log/nginx$ grep /lzip/0m0vd873jp61lcm4xa3ljdgx381qa782-guile-3.0.2 < /tmp/sample.log |wc -l 1593 --8<---------------cut here---------------end--------------->8--- From that, we could deduce that about 1% of our users who take substitutes from ci.guix are still using a pre-1.1.0 daemon without support for lzip compression. I find it surprisingly low: 1.1.0 was released “only” 9 months ago, which is not a lot for someone used to the long release cycles of “stable” distros. It might be underestimated: users running an old daemon probably update less often and may thus be underrepresented in the substitute logs. As for whether it’s OK to drop gzip substitutes altogether: I’m not confident about knowingly breaking 1% or more of the deployed Guixes, but it’s all about tradeoffs. Ludo’. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Are gzip-compressed substitutes still used? 2021-01-28 17:53 ` Are gzip-compressed substitutes still used? Ludovic Courtès @ 2021-03-17 17:12 ` Ludovic Courtès 2021-03-17 17:33 ` Léo Le Bouter ` (3 more replies) 0 siblings, 4 replies; 43+ messages in thread From: Ludovic Courtès @ 2021-03-17 17:12 UTC (permalink / raw) To: guix-devel [-- Attachment #1: Type: text/plain, Size: 2581 bytes --] Hi, Ludovic Courtès <ludo@gnu.org> skribis: > From that, we could deduce that about 1% of our users who take > substitutes from ci.guix are still using a pre-1.1.0 daemon without > support for lzip compression. > > I find it surprisingly low: 1.1.0 was released “only” 9 months ago, > which is not a lot for someone used to the long release cycles of > “stable” distros. (See <https://lists.gnu.org/archive/html/guix-devel/2021-01/msg00378.html> for the initial message.) Here’s an update, 1.5 month later. This time I’m looking at nginx logs covering Feb 8th to Mar 17th and using a laxer regexp than in the message above, here are the gzip/lzip download ratio for several packages: --8<---------------cut here---------------start------------->8--- ludo@berlin ~$ ./nar-download-stats.sh /tmp/sample3.log gtk%2B-3: gzip/lzip ratio: 37/3255 1% glib-2: gzip/lzip ratio: 97/8629 1% coreutils-8: gzip/lzip ratio: 81/2306 3% python-3: gzip/lzip ratio: 120/7177 1% r-minimal-[34]: gzip/lzip ratio: 8/302 2% openmpi-4: gzip/lzip ratio: 19/236 8% hwloc-2: gzip/lzip ratio: 10/43 23% gfortran-7: gzip/lzip ratio: 6/225 2% --8<---------------cut here---------------end--------------->8--- (Script attached.) The hwloc/openmpi outlier is intriguing. Is it one HPC web site running an old daemon, or several of them? Looking more closely, it’s 22 of them on 8 different networks (looking at the first three digits of the IP address): --8<---------------cut here---------------start------------->8--- ludo@berlin ~$ grep -E '/gzip/[[:alnum:]]{32}-(hwloc-2|openmpi-4)\.[[:digit:]]+\.[[:digit:]]+ ' < /tmp/sample3.log | cut -f1 -d- | sort -u | wc -l 22 ludo@berlin ~$ grep -E '/gzip/[[:alnum:]]{32}-(hwloc-2|openmpi-4)\.[[:digit:]]+\.[[:digit:]]+ ' < /tmp/sample3.log | cut -f1 -d- | cut -f 1-3 -d. | sort -u | wc -l 8 --8<---------------cut here---------------end--------------->8--- Conclusion? It still sounds like we can’t reasonably remove gzip support just yet. I’d still like to start providing zstd-compressed substitutes though. So I think what we can do is: • start providing zstd substitutes on berlin right now so that when 1.2.1 comes out, at least some substitutes are available as zstd; • when 1.2.1 is announced, announce that gzip substitutes may be removed in the future and invite users to upgrade; • revisit this issue with an eye on dropping gzip within 6–18 months. Thoughts? Ludo’. [-- Attachment #2: the script --] [-- Type: text/plain, Size: 605 bytes --] #!/bin/sh if [ ! "$#" = 1 ] then echo "Usage: $1 NGINX-LOG-FILE" exit 1 fi set -e sample="$1" items="gtk%2B-3 glib-2 coreutils-8 python-3 r-minimal-[34] openmpi-4 hwloc-2 gfortran-7" for i in $items do # Tweak the regexp so we don't catch ".drv" substitutes as these # usually compress better with gzip. lzip="$(grep -E "/lzip/[[:alnum:]]{32}-$i\\.[[:digit:]]+(\\.[[:digit:]]+)? " < "$sample" | wc -l)" gzip="$(grep -E "/gzip/[[:alnum:]]{32}-$i\\.[[:digit:]]+(\\.[[:digit:]]+)? " < "$sample" | wc -l)" echo "$i: gzip/lzip ratio: $gzip/$lzip $(($gzip * 100 / $lzip))%" done ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Are gzip-compressed substitutes still used? 2021-03-17 17:12 ` Ludovic Courtès @ 2021-03-17 17:33 ` Léo Le Bouter 2021-03-17 18:08 ` Vagrant Cascadian 2021-03-17 18:06 ` zimoun ` (2 subsequent siblings) 3 siblings, 1 reply; 43+ messages in thread From: Léo Le Bouter @ 2021-03-17 17:33 UTC (permalink / raw) To: Ludovic Courtès, guix-devel [-- Attachment #1: Type: text/plain, Size: 220 bytes --] Just as a reminder siding with vagrantc here: We must ensure the Debian 'guix' package can still work and upgrade from it's installed version, so ensure that removing gzip doesnt break initial 'guix pull' with it. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Are gzip-compressed substitutes still used? 2021-03-17 17:33 ` Léo Le Bouter @ 2021-03-17 18:08 ` Vagrant Cascadian 2021-03-18 0:03 ` zimoun 2021-03-20 11:23 ` Ludovic Courtès 0 siblings, 2 replies; 43+ messages in thread From: Vagrant Cascadian @ 2021-03-17 18:08 UTC (permalink / raw) To: Léo Le Bouter, Ludovic Courtès, guix-devel [-- Attachment #1: Type: text/plain, Size: 756 bytes --] On 2021-03-17, Léo Le Bouter wrote: > Just as a reminder siding with vagrantc here: > > We must ensure the Debian 'guix' package can still work and upgrade > from it's installed version, so ensure that removing gzip doesnt break > initial 'guix pull' with it. ... and I would expect this version to ship in Debian for another ~3-5 years, unless it gets removed from Debian bullseye before the upcoming (real soon now) release! But if lzip substitutes are still supported, I *think* guix 1.2.0 as packaged in Debian still supports that, at least. Dropping both gzip and lzip would be unfortunate; I don't think it would be trivial to backport the zstd patches to guix 1.2.0, as it also depends on guile-zstd? live well, vagrant [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 227 bytes --] ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Are gzip-compressed substitutes still used? 2021-03-17 18:08 ` Vagrant Cascadian @ 2021-03-18 0:03 ` zimoun 2021-03-18 16:00 ` Vagrant Cascadian 2021-03-20 11:23 ` Ludovic Courtès 1 sibling, 1 reply; 43+ messages in thread From: zimoun @ 2021-03-18 0:03 UTC (permalink / raw) To: Vagrant Cascadian, Léo Le Bouter, Ludovic Courtès, guix-devel Hi Vagrant, On Wed, 17 Mar 2021 at 11:08, Vagrant Cascadian <vagrant@debian.org> wrote: > ... and I would expect this version to ship in Debian for another ~3-5 > years, unless it gets removed from Debian bullseye before the upcoming > (real soon now) release! I could miss a point. In 3-5 years, some people will be still running Debian stable (or maybe oldstable or maybe this stable is LTS), so they will “apt install guix” at 1.2.0, right? But then there is no guarantee that Berlin will still serve this 5 years old binary substitutes. But “guix install” fallback by compiling what is missing, right? Then the question will be: are the upstream sources still available? Assuming that SWH is still alive at this future, all the git-fetch packages will have their source, whatever the upstream status. For all the other methods, there is no guarantee. On the other hand, at this 3-5 years future, after “apt install guix”, people will not do “guix install” but instead they should do “guix pull”. Therefore, the compression of substitutes does not matter that much, right? The only strong backward compatibility seems between “guix pull” rather than all the substitutes themselves. Isn’t it? Other said, at least keep all the necessary to have “guix pull” at 1.2.0 be able to complete. Thanks for this opportunity to think at such time scale. :-) Cheers, simon ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Are gzip-compressed substitutes still used? 2021-03-18 0:03 ` zimoun @ 2021-03-18 16:00 ` Vagrant Cascadian 2021-03-18 18:53 ` Leo Famulari 0 siblings, 1 reply; 43+ messages in thread From: Vagrant Cascadian @ 2021-03-18 16:00 UTC (permalink / raw) To: zimoun, Léo Le Bouter, Ludovic Courtès, guix-devel [-- Attachment #1: Type: text/plain, Size: 2668 bytes --] On 2021-03-18, zimoun wrote: > On Wed, 17 Mar 2021 at 11:08, Vagrant Cascadian <vagrant@debian.org> wrote: > >> ... and I would expect this version to ship in Debian for another ~3-5 >> years, unless it gets removed from Debian bullseye before the upcoming >> (real soon now) release! > > I could miss a point. In 3-5 years, some people will be still running > Debian stable (or maybe oldstable or maybe this stable is LTS), so they > will “apt install guix” at 1.2.0, right? But then there is no guarantee > that Berlin will still serve this 5 years old binary substitutes. But > “guix install” fallback by compiling what is missing, right? Sure. > Then the question will be: are the upstream sources still available? > Assuming that SWH is still alive at this future, all the git-fetch > packages will have their source, whatever the upstream status. For > all the other methods, there is no guarantee. There is never a guarantee of source availability from third parties; one of the downsides of the Guix approach to source management vs. Debian (e.g. all released sources are mirrored on Debian-controlled infrastructure ... which brings up an interesting aside; could Debian, OpenSuSE, Fedora, etc. archives could be treated as a fallback mirror for upstream tarballs). > On the other hand, at this 3-5 years future, after “apt install guix”, > people will not do “guix install” but instead they should do “guix > pull”. Therefore, the compression of substitutes does not matter that > much, right? Except for issues like the openssl bug which causes build failure due to certificate expiry in the test suite basically would break guix pull in those cases... maybe that is a deal breaker for the Debian packaged guix... > The only strong backward compatibility seems between “guix pull” rather > than all the substitutes themselves. Isn’t it? Other said, at least > keep all the necessary to have “guix pull” at 1.2.0 be able to complete. The guix-daemon is still run from the packaged version installed as /usr/bin/guix-daemon, so would need to be patched to get updates for new features and ... in light of https://issues.guix.gnu.org/47229 ... security updates! It is of course possible to configure to use an updated guix-daemon from a user's profile (e.g. as recommended with guix-binary installation on a foreign distro), but out-of-the-box it uses the guix-daemon shipped in the package, which, at least with my Debian hat on, is how it should be. > Thanks for this opportunity to think at such time scale. :-) Heh. :) live well, vagrant [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 227 bytes --] ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Are gzip-compressed substitutes still used? 2021-03-18 16:00 ` Vagrant Cascadian @ 2021-03-18 18:53 ` Leo Famulari 0 siblings, 0 replies; 43+ messages in thread From: Leo Famulari @ 2021-03-18 18:53 UTC (permalink / raw) To: Vagrant Cascadian; +Cc: guix-devel [-- Attachment #1: Type: text/plain, Size: 376 bytes --] On Thu, Mar 18, 2021 at 09:00:20AM -0700, Vagrant Cascadian wrote: > Except for issues like the openssl bug which causes build failure due to > certificate expiry in the test suite basically would break guix pull in > those cases... maybe that is a deal breaker for the Debian packaged > guix... To clarify, this bug was in GnuTLS, not OpenSSL: <https://bugs.gnu.org/44559> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Are gzip-compressed substitutes still used? 2021-03-17 18:08 ` Vagrant Cascadian 2021-03-18 0:03 ` zimoun @ 2021-03-20 11:23 ` Ludovic Courtès 1 sibling, 0 replies; 43+ messages in thread From: Ludovic Courtès @ 2021-03-20 11:23 UTC (permalink / raw) To: Vagrant Cascadian; +Cc: guix-devel Vagrant Cascadian <vagrant@debian.org> skribis: > On 2021-03-17, Léo Le Bouter wrote: >> Just as a reminder siding with vagrantc here: >> >> We must ensure the Debian 'guix' package can still work and upgrade >> from it's installed version, so ensure that removing gzip doesnt break >> initial 'guix pull' with it. > > ... and I would expect this version to ship in Debian for another ~3-5 > years, unless it gets removed from Debian bullseye before the upcoming > (real soon now) release! > > But if lzip substitutes are still supported, I *think* guix 1.2.0 as > packaged in Debian still supports that, at least. > > Dropping both gzip and lzip would be unfortunate; I don't think it would > be trivial to backport the zstd patches to guix 1.2.0, as it also > depends on guile-zstd? Indeed. But don’t worry: we wouldn’t drop both gzip and lzip at once! Ludo’. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Are gzip-compressed substitutes still used? 2021-03-17 17:12 ` Ludovic Courtès 2021-03-17 17:33 ` Léo Le Bouter @ 2021-03-17 18:06 ` zimoun 2021-03-17 18:20 ` Jonathan Brielmaier 2021-03-18 17:25 ` Pierre Neidhardt 3 siblings, 0 replies; 43+ messages in thread From: zimoun @ 2021-03-17 18:06 UTC (permalink / raw) To: Ludovic Courtès, guix-devel Hi, On Wed, 17 Mar 2021 at 18:12, Ludovic Courtès <ludo@gnu.org> wrote: > I’d still like to start providing zstd-compressed substitutes though. > So I think what we can do is: > > • start providing zstd substitutes on berlin right now so that when > 1.2.1 comes out, at least some substitutes are available as zstd; > > • when 1.2.1 is announced, announce that gzip substitutes may be > removed in the future and invite users to upgrade; > > • revisit this issue with an eye on dropping gzip within 6–18 months. Sounds reasonable. The full removal could be announced for the 1.4 release. Even if we do not know when it will happen. ;-) So people know what to expect. Cheers, simon ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Are gzip-compressed substitutes still used? 2021-03-17 17:12 ` Ludovic Courtès 2021-03-17 17:33 ` Léo Le Bouter 2021-03-17 18:06 ` zimoun @ 2021-03-17 18:20 ` Jonathan Brielmaier 2021-03-18 17:25 ` Pierre Neidhardt 3 siblings, 0 replies; 43+ messages in thread From: Jonathan Brielmaier @ 2021-03-17 18:20 UTC (permalink / raw) To: guix-devel On 17.03.21 18:12, Ludovic Courtès wrote: > (See > <https://lists.gnu.org/archive/html/guix-devel/2021-01/msg00378.html> > for the initial message.) > > Here’s an update, 1.5 month later. This time I’m looking at nginx logs > covering Feb 8th to Mar 17th and using a laxer regexp than in the > message above, here are the gzip/lzip download ratio for several > packages: > > --8<---------------cut here---------------start------------->8--- > ludo@berlin ~$ ./nar-download-stats.sh /tmp/sample3.log gtk%2B-3: gzip/lzip ratio: 37/3255 1% > glib-2: gzip/lzip ratio: 97/8629 1% > coreutils-8: gzip/lzip ratio: 81/2306 3% > python-3: gzip/lzip ratio: 120/7177 1% > r-minimal-[34]: gzip/lzip ratio: 8/302 2% > openmpi-4: gzip/lzip ratio: 19/236 8% > hwloc-2: gzip/lzip ratio: 10/43 23% > gfortran-7: gzip/lzip ratio: 6/225 2% > --8<---------------cut here---------------end--------------->8--- Interesting findings... > Conclusion? It still sounds like we can’t reasonably remove gzip > support just yet. > > I’d still like to start providing zstd-compressed substitutes though. > So I think what we can do is: > > • start providing zstd substitutes on berlin right now so that when > 1.2.1 comes out, at least some substitutes are available as zstd; > > • when 1.2.1 is announced, announce that gzip substitutes may be > removed in the future and invite users to upgrade; My personal substitution servers runs with lzip + zstd, so no gzip. It works fine, but I didn't had any "legacy" users. Although zstd is only 0,6% of lzip in regards to total downloads over the last months... ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: Are gzip-compressed substitutes still used? 2021-03-17 17:12 ` Ludovic Courtès ` (2 preceding siblings ...) 2021-03-17 18:20 ` Jonathan Brielmaier @ 2021-03-18 17:25 ` Pierre Neidhardt 3 siblings, 0 replies; 43+ messages in thread From: Pierre Neidhardt @ 2021-03-18 17:25 UTC (permalink / raw) To: Ludovic Courtès, guix-devel [-- Attachment #1: Type: text/plain, Size: 727 bytes --] Hi Ludo! On a side note, the following shell incantations > --8<---------------cut here---------------start------------->8--- > ludo@berlin ~$ grep -E '/gzip/[[:alnum:]]{32}-(hwloc-2|openmpi-4)\.[[:digit:]]+\.[[:digit:]]+ ' < /tmp/sample3.log | cut -f1 -d- | sort -u | wc -l > 22 > ludo@berlin ~$ grep -E '/gzip/[[:alnum:]]{32}-(hwloc-2|openmpi-4)\.[[:digit:]]+\.[[:digit:]]+ ' < /tmp/sample3.log | cut -f1 -d- | cut -f 1-3 -d. | sort -u | wc -l > 8 > --8<---------------cut here---------------end--------------->8--- are perfect examples for why it's high time we moved to a better shell language :D https://ambrevar.xyz/lisp-repl-shell/index.html Cheers! -- Pierre Neidhardt https://ambrevar.xyz/ [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 511 bytes --] ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2021-01-14 21:51 ` Ludovic Courtès 2021-01-14 22:08 ` Nicolò Balzarotti @ 2021-01-15 8:10 ` Pierre Neidhardt 2021-01-28 17:58 ` Ludovic Courtès 1 sibling, 1 reply; 43+ messages in thread From: Pierre Neidhardt @ 2021-01-15 8:10 UTC (permalink / raw) To: Ludovic Courtès, Guillaume Le Vaillant; +Cc: guix-devel [-- Attachment #1: Type: text/plain, Size: 1443 bytes --] Ludovic Courtès <ludo@gnu.org> writes: > One takeaway for me is that zstd decompression remains an order of > magnitude faster than the others, regardless of the compression level. > > Another one is that at level 10 and higher zstd achieves compression > ratios that are more in the ballpark of lzip. Hmmm, this is roughly true for lzip < level 6, but as soon as lzip hits level 6 (the default!) it compresses up to twice as much! > If we are to change the compression methods used at ci.guix.gnu.org, we > could use zstd >= 10. On Guillaume's graph, the compression speed at the default level 3 is about 110 MB/s, while at level 10 it's about 40 MB/s, which is approximately the gzip speed. If server compression time does not matter, then I agree, level >= 10 would be a good option. What about zstd level 19 then? It's as slow as lzip to compress, but decompresses still blazingly fast, which is what we are trying to achieve here, _while_ offering a compression ration in the ballpark of lzip level 6 (but still not that of lzip level 9). > We could also drop gzip, but there are probably pre-1.1 daemons out > there that understand nothing but gzip¹, so perhaps that’ll have to > wait. Now, compressing substitutes three times may be somewhat > unreasonable. Agreed, maybe release an announcement and give it a few months / 1 year? Cheers! -- Pierre Neidhardt https://ambrevar.xyz/ [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 511 bytes --] ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2021-01-15 8:10 ` When substitute download + decompression is CPU-bound Pierre Neidhardt @ 2021-01-28 17:58 ` Ludovic Courtès 2021-01-29 9:45 ` Pierre Neidhardt 2021-01-29 13:33 ` zimoun 0 siblings, 2 replies; 43+ messages in thread From: Ludovic Courtès @ 2021-01-28 17:58 UTC (permalink / raw) To: Pierre Neidhardt; +Cc: guix-devel Pierre Neidhardt <mail@ambrevar.xyz> skribis: > On Guillaume's graph, the compression speed at the default level 3 is > about 110 MB/s, while at level 10 it's about 40 MB/s, which is > approximately the gzip speed. > > If server compression time does not matter, then I agree, level >= 10 > would be a good option. > > What about zstd level 19 then? It's as slow as lzip to compress, but > decompresses still blazingly fast, which is what we are trying to > achieve here, _while_ offering a compression ration in the ballpark of > lzip level 6 (but still not that of lzip level 9). We could do that. I suppose a possible agenda would be: 1. Start providing zstd susbstitutes anytime. However, most clients will keep choosing lzip because it usually compresses better. 2. After the next release, stop providing lzip substitutes and provide only gzip + zstd-19. This option has the advantage that it wouldn’t break any installation. It’s not as nice as the ability to choose a download strategy, as we discussed earlier, but implementing that download strategy sounds tricky. Thoughts? Ludo’. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2021-01-28 17:58 ` Ludovic Courtès @ 2021-01-29 9:45 ` Pierre Neidhardt 2021-01-29 11:23 ` Guillaume Le Vaillant 2021-01-29 13:33 ` zimoun 1 sibling, 1 reply; 43+ messages in thread From: Pierre Neidhardt @ 2021-01-29 9:45 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guix-devel [-- Attachment #1: Type: text/plain, Size: 1837 bytes --] Hi Ludo! Ludovic Courtès <ludo@gnu.org> writes: >> On Guillaume's graph, the compression speed at the default level 3 is >> about 110 MB/s, while at level 10 it's about 40 MB/s, which is >> approximately the gzip speed. >> >> If server compression time does not matter, then I agree, level >= 10 >> would be a good option. >> >> What about zstd level 19 then? It's as slow as lzip to compress, but >> decompresses still blazingly fast, which is what we are trying to >> achieve here, _while_ offering a compression ration in the ballpark of >> lzip level 6 (but still not that of lzip level 9). > > We could do that. I suppose a possible agenda would be: > > 1. Start providing zstd susbstitutes anytime. However, most clients > will keep choosing lzip because it usually compresses better. > > 2. After the next release, stop providing lzip substitutes and provide > only gzip + zstd-19. > > This option has the advantage that it wouldn’t break any installation. But why would we keep gzip since it offers no benefits compared to zstd? It feels like continuing to carry a (huge) burden forever... Besides, dropping Lzip seems like a step backward in my opinion. Users with lower bandwidth (or simply further away from Berlin) will be impacted a lot. I would opt for dropping gzip instead, only to keep zstd-19 and lzip-9 (possibly plzip-9 if we update the bindings). > It’s not as nice as the ability to choose a download strategy, as we > discussed earlier, but implementing that download strategy sounds > tricky. If the user can choose their favourite substitute compression, I believe it's usually enough since they are the best judge of their bandwidth / hardware requirements. Wouldn't this simple enough? -- Pierre Neidhardt https://ambrevar.xyz/ [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 511 bytes --] ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2021-01-29 9:45 ` Pierre Neidhardt @ 2021-01-29 11:23 ` Guillaume Le Vaillant 2021-01-29 11:55 ` Nicolò Balzarotti 2021-02-01 22:18 ` Ludovic Courtès 0 siblings, 2 replies; 43+ messages in thread From: Guillaume Le Vaillant @ 2021-01-29 11:23 UTC (permalink / raw) To: Pierre Neidhardt; +Cc: guix-devel [-- Attachment #1: Type: text/plain, Size: 3571 bytes --] Pierre Neidhardt <mail@ambrevar.xyz> skribis: > Hi Ludo! > > Ludovic Courtès <ludo@gnu.org> writes: > >> I suppose a possible agenda would be: >> >> 1. Start providing zstd susbstitutes anytime. However, most clients >> will keep choosing lzip because it usually compresses better. >> >> 2. After the next release, stop providing lzip substitutes and provide >> only gzip + zstd-19. >> >> This option has the advantage that it wouldn’t break any installation. > > But why would we keep gzip since it offers no benefits compared to zstd? > It feels like continuing to carry a (huge) burden forever... > > Besides, dropping Lzip seems like a step backward in my opinion. Users > with lower bandwidth (or simply further away from Berlin) will be > impacted a lot. > > I would opt for dropping gzip instead, only to keep zstd-19 and lzip-9 > (possibly plzip-9 if we update the bindings). > >> It’s not as nice as the ability to choose a download strategy, as we >> discussed earlier, but implementing that download strategy sounds >> tricky. > > If the user can choose their favourite substitute compression, I believe > it's usually enough since they are the best judge of their bandwidth / > hardware requirements. > > Wouldn't this simple enough? Here are a few numbers for the installation time in seconds (download time + decompression time) when fetching 580 MB of substitutes for download speeds between 0.5 MB/s and 20 MB/s. | Download speed | gzip -9 | lzip -9 | zstd -19 | |----------------+---------+---------+----------| | 0.5 | 287 | 151 | 181 | | 1.0 | 144 | 78 | 91 | | 1.5 | 97 | 54 | 61 | | 2.0 | 73 | 42 | 46 | | 2.5 | 59 | 35 | 37 | | 3.0 | 49 | 30 | 31 | | 3.5 | 42 | 27 | 26 | | 4.0 | 37 | 24 | 23 | | 4.5 | 33 | 22 | 21 | | 5.0 | 30 | 21 | 19 | | 5.5 | 28 | 19 | 17 | | 6.0 | 25 | 18 | 16 | | 6.5 | 24 | 17 | 14 | | 7.0 | 22 | 17 | 14 | | 7.5 | 21 | 16 | 13 | | 8.0 | 20 | 15 | 12 | | 8.5 | 18 | 15 | 11 | | 9.0 | 18 | 14 | 11 | | 9.5 | 17 | 14 | 10 | | 10.0 | 16 | 13 | 10 | | 11.0 | 15 | 13 | 9 | | 12.0 | 14 | 12 | 8 | | 13.0 | 13 | 12 | 8 | | 14.0 | 12 | 11 | 7 | | 15.0 | 11 | 11 | 7 | | 16.0 | 11 | 11 | 6 | | 17.0 | 10 | 10 | 6 | | 18.0 | 10 | 10 | 6 | | 19.0 | 9 | 10 | 5 | | 20.0 | 9 | 10 | 5 | When the download speed is lower than 3.5 MB/s, Lzip is better, and above that speed Zstd is better. As Gzip is never the best choice, it would make sense to drop it, even if we have to wait a little until everyone has updated their Guix daemon to a version with at least Lzip support. I think there are many people (like me) with a download speed slower than 3 MB/s, so like Pierre I would prefer keeping "lzip -9" and "zstd -19". [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 247 bytes --] ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2021-01-29 11:23 ` Guillaume Le Vaillant @ 2021-01-29 11:55 ` Nicolò Balzarotti 2021-01-29 12:13 ` Pierre Neidhardt 2021-02-01 22:18 ` Ludovic Courtès 1 sibling, 1 reply; 43+ messages in thread From: Nicolò Balzarotti @ 2021-01-29 11:55 UTC (permalink / raw) To: Guillaume Le Vaillant, Pierre Neidhardt; +Cc: guix-devel Guillaume Le Vaillant <glv@posteo.net> writes: > Here are a few numbers for the installation time in seconds (download > time + decompression time) when fetching 580 MB of substitutes for > download speeds between 0.5 MB/s and 20 MB/s. Which hardware did you use? Since you are fixing the download speed, those results really depend on cpu speed. > As Gzip is never the best choice, it would make sense to drop it, even > if we have to wait a little until everyone has updated their Guix daemon My hypothesis is that this won't be the case on something slow like the raspberry pi 1. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2021-01-29 11:55 ` Nicolò Balzarotti @ 2021-01-29 12:13 ` Pierre Neidhardt 2021-01-29 13:06 ` Guillaume Le Vaillant 2021-01-29 14:55 ` Nicolò Balzarotti 0 siblings, 2 replies; 43+ messages in thread From: Pierre Neidhardt @ 2021-01-29 12:13 UTC (permalink / raw) To: Nicolò Balzarotti, Guillaume Le Vaillant; +Cc: guix-devel [-- Attachment #1: Type: text/plain, Size: 489 bytes --] Nicolò Balzarotti <anothersms@gmail.com> writes: >> As Gzip is never the best choice, it would make sense to drop it, even >> if we have to wait a little until everyone has updated their Guix daemon > > My hypothesis is that this won't be the case on something slow like the > raspberry pi 1. What wouldn't be the case? If you mean that "gzip is never the best choice", wouldn't Zstd outperform gzip on the Raspberry Pi 1 too? -- Pierre Neidhardt https://ambrevar.xyz/ [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 511 bytes --] ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2021-01-29 12:13 ` Pierre Neidhardt @ 2021-01-29 13:06 ` Guillaume Le Vaillant 2021-01-29 14:55 ` Nicolò Balzarotti 1 sibling, 0 replies; 43+ messages in thread From: Guillaume Le Vaillant @ 2021-01-29 13:06 UTC (permalink / raw) To: Pierre Neidhardt; +Cc: guix-devel, Nicolò Balzarotti [-- Attachment #1: Type: text/plain, Size: 1323 bytes --] Nicolò Balzarotti <anothersms@gmail.com> skribis: > Which hardware did you use? Since you are fixing the download speed, > those results really depend on cpu speed. I ran these tests on a laptop from 2012 with an Intel i7-3630QM CPU. When the CPU speed increases, the download speed limit below which Lzip is the best choice also increases. For example, in my test Lzip is the best choice if the download speed is below 3.5 MB/s. With a CPU running twice faster, Lzip is the best choice when the download speed is below 6.5 MB/s. Pierre Neidhardt <mail@ambrevar.xyz> skribis: > Nicolò Balzarotti <anothersms@gmail.com> writes: > >>> As Gzip is never the best choice, it would make sense to drop it, even >>> if we have to wait a little until everyone has updated their Guix daemon >> >> My hypothesis is that this won't be the case on something slow like the >> raspberry pi 1. > > What wouldn't be the case? If you mean that "gzip is never the best > choice", wouldn't Zstd outperform gzip on the Raspberry Pi 1 too? I saw a compression benchmark somewhere on the internet (I can't remember where right now) where Gzip decompression on a Raspberry Pi 2 was around 40 MB/s, and Zstd decompression was around 50 MB/s. I guess Zstd will also be faster than Gzip on a Raspberry Pi 1. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 247 bytes --] ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2021-01-29 12:13 ` Pierre Neidhardt 2021-01-29 13:06 ` Guillaume Le Vaillant @ 2021-01-29 14:55 ` Nicolò Balzarotti 1 sibling, 0 replies; 43+ messages in thread From: Nicolò Balzarotti @ 2021-01-29 14:55 UTC (permalink / raw) To: Pierre Neidhardt, Guillaume Le Vaillant; +Cc: guix-devel [-- Attachment #1: Type: text/plain, Size: 571 bytes --] Pierre Neidhardt <mail@ambrevar.xyz> writes: > Nicolò Balzarotti <anothersms@gmail.com> writes: > > What wouldn't be the case? If you mean that "gzip is never the best > choice", wouldn't Zstd outperform gzip on the Raspberry Pi 1 too? My bad, you are right. Also, memory usage shoudn't be a problem. gzip uses way less (testd on ungoogled chromium, I get ~16kb peak heap size for gzip, 8Mb for zstd and 32Mb for lzip), but I'd expect guix to be running on systems with more than 8Mb of memory. Just for reference, here's the memory profiling script [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: profiling --] [-- Type: text/x-org, Size: 10135 bytes --] #+PROPERTY: header-args:bash :session valgrind * NAR Decompression memory benchmark #+begin_src bash :results none guix environment --ad-hoc valgrind lzip gzip zstd wget #+end_src #+begin_src bash :cache yes valgrind --version | sed 's/-/ /' lzip --version | head -1 gunzip --version | head -1 | sed 's/\s(/(/' zstd --version | sed -e 's/command.*v//' -e 's/,.*//' -e 's/**//' #+end_src #+RESULTS[e07cecfd5cc770b7a898408b80678f2e8ea7772e]: | valgrind | 3.16.1 | | lzip | 1.21 | | gunzip(gzip) | 1.1 | | zstd | 1.4.4 | Just noticed that there should be a new zstd release ([[https://github.com/facebook/zstd/releases/][zstd 1.4.8]]), and a new lzip release ([[https://download.savannah.gnu.org/releases/lzip/][lzip 1.22]]). ** Prepare required data #+begin_src bash :cache yes wget https://ci.guix.gnu.org/nar/gzip/kfcrrl6p6f6v51jg5rirmq3q067zxih6-ungoogled-chromium-87.0.4280.88-0.b78cb92 -O uc.nar.gz wget https://ci.guix.gnu.org/nar/lzip/kfcrrl6p6f6v51jg5rirmq3q067zxih6-ungoogled-chromium-87.0.4280.88-0.b78cb92 -O uc.nar.lz #+end_src #+RESULTS[ea17e5a54da1ca54a9c82f264912675d9ca981a0]: Create zstd compressed file #+begin_src bash :results none gunzip -c < uc.nar.gz > uc.nar zstd -19 uc.nar -o uc.nar.zstd #+end_src Check file sizes #+begin_src bash ls -lh --sort=size | head -5 #+end_src #+RESULTS: | total | 585M | | | | | | | | | -rw-r--r-- | 1 | nixo | users | 335M | Jan | 29 | 15:14 | uc.nar | | -rw-r--r-- | 1 | nixo | users | 103M | Jan | 29 | 15:13 | uc.nar.gz | | -rw-r--r-- | 1 | nixo | users | 78M | Jan | 29 | 15:14 | uc.nar.zstd | | -rw-r--r-- | 1 | nixo | users | 71M | Jan | 29 | 15:13 | uc.nar.lz | ** Decompress #+name: massif #+begin_src bash :session valgrind :var command="ls" input="." output="/dev/null" name="ls" time valgrind --tool=massif --log-file=/dev/null --time-unit=B --trace-children=yes --massif-out-file=$name.massif $command < $input >$output #+end_src #+call: massif(command="gunzip -c", input="uc.nar.gz", output="/dev/null", name="gzip") #+RESULTS: | nixo@guixSD | ~/prof | [env]$ | nixo@guixSD | ~/prof | [env]$ | nixo@guixSD | ~/prof | [env]$ | | real | 0m8.291s | | | | | | | | | user | 0m7.910s | | | | | | | | | sys | 0m0.201s | | | | | | | | #+call: massif(command="lzip -d", input="uc.nar.lz", output="/dev/null", name="lzip") #+RESULTS: | nixo@guixSD | ~/prof | [env]$ | nixo@guixSD | ~/prof | [env]$ | nixo@guixSD | ~/prof | [env]$ | | real | 0m22.378s | | | | | | | | | user | 0m20.959s | | | | | | | | | sys | 0m0.345s | | | | | | | | #+call: massif(command="zstd -d", input="uc.nar.zstd", output="/dev/null", name="zstd") #+RESULTS: | nixo@guixSD | ~/prof | [env]$ | nixo@guixSD | ~/prof | [env]$ | nixo@guixSD | ~/prof | [env]$ | | real | 0m4.607s | | | | | | | | | user | 0m4.157s | | | | | | | | | sys | 0m0.135s | | | | | | | | ** Check massif output #+begin_src bash :results raw drawer for ext in gzip lzip zstd; do ms_print $ext.massif > $ext.graph done #+end_src #+RESULTS: :results: :end: -------------------------------------------------------------------------------- Command: /gnu/store/378zjf2kgajcfd7mfr98jn5xyc5wa3qv-gzip-1.10/bin/gzip -d -c Massif arguments: --time-unit=B --massif-out-file=gzip.massif ms_print arguments: gzip.massif -------------------------------------------------------------------------------- KB 15.59^ # | # | # | # | # | # : | # : | # : | : # :: : : | : # :: : : : | : :#: ::::: :: : :: : : : : : @@ : | : :#: :: : :: : : : : : : : @ : | : :#: :: : :: : : : : : : : @ : | : :#: :: : :: : : : : : : : : @ : | : :#: :: : :: : : : : : : : : @ : | : :#: :: : ::: ::: :::: : : : :: : @ : : | : :#: :: : ::: ::: :: :: : : : :::: :::: : @ : : | : :#::::: : ::: ::: : :: : ::: : ::: : ::: :::@@ : @ ::: : | : :#:: :: : :::::::: : :: : :: :: ::: : ::: :: @ :::@ :: : | ::::#:: :: : :::: ::: : :: : :: :: ::: : ::: :: @ :::@ :: : : 0 +----------------------------------------------------------------------->MB 0 114.8 Number of snapshots: 53 Detailed snapshots: [5 (peak), 21, 39, 42, 46, 50] -------------------------------------------------------------------------------- Command: lzip -d Massif arguments: --time-unit=B --massif-out-file=lzip.massif ms_print arguments: lzip.massif -------------------------------------------------------------------------------- MB 32.09^ ################################### | # | # | # | # | # | # | # | # | # | # | # | # | # | # | # | # | # | # | # 0 +----------------------------------------------------------------------->MB 0 64.18 Number of snapshots: 12 Detailed snapshots: [6 (peak)] -------------------------------------------------------------------------------- Command: zstd -d Massif arguments: --time-unit=B --massif-out-file=zstd.massif ms_print arguments: zstd.massif -------------------------------------------------------------------------------- MB 8.665^ # | :#::::::::::::::::::::::::::::::::: | :# | :# | :# | :# | :# | :# | :# | :# | :# | :# | :# | :# | :# | :# | :# | :# | :# | :# 0 +----------------------------------------------------------------------->MB 0 17.33 Number of snapshots: 18 Detailed snapshots: [9 (peak)] ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2021-01-29 11:23 ` Guillaume Le Vaillant 2021-01-29 11:55 ` Nicolò Balzarotti @ 2021-02-01 22:18 ` Ludovic Courtès 1 sibling, 0 replies; 43+ messages in thread From: Ludovic Courtès @ 2021-02-01 22:18 UTC (permalink / raw) To: Guillaume Le Vaillant; +Cc: guix-devel Hi, Guillaume Le Vaillant <glv@posteo.net> skribis: > Pierre Neidhardt <mail@ambrevar.xyz> skribis: [...] >>> It’s not as nice as the ability to choose a download strategy, as we >>> discussed earlier, but implementing that download strategy sounds >>> tricky. >> >> If the user can choose their favourite substitute compression, I believe >> it's usually enough since they are the best judge of their bandwidth / >> hardware requirements. As should be clear with what Guillaume and Nico posted, it’s pretty hard to determine whether you need one compression algorithm or the other, and it changes as you move your laptop around (different networking, different CPU frequency scaling strategy, etc.). > Here are a few numbers for the installation time in seconds (download > time + decompression time) when fetching 580 MB of substitutes for > download speeds between 0.5 MB/s and 20 MB/s. > > | Download speed | gzip -9 | lzip -9 | zstd -19 | > |----------------+---------+---------+----------| > | 0.5 | 287 | 151 | 181 | > | 1.0 | 144 | 78 | 91 | > | 1.5 | 97 | 54 | 61 | > | 2.0 | 73 | 42 | 46 | > | 2.5 | 59 | 35 | 37 | > | 3.0 | 49 | 30 | 31 | > | 3.5 | 42 | 27 | 26 | > | 4.0 | 37 | 24 | 23 | > | 4.5 | 33 | 22 | 21 | > | 5.0 | 30 | 21 | 19 | > | 5.5 | 28 | 19 | 17 | > | 6.0 | 25 | 18 | 16 | > | 6.5 | 24 | 17 | 14 | > | 7.0 | 22 | 17 | 14 | > | 7.5 | 21 | 16 | 13 | > | 8.0 | 20 | 15 | 12 | > | 8.5 | 18 | 15 | 11 | > | 9.0 | 18 | 14 | 11 | > | 9.5 | 17 | 14 | 10 | > | 10.0 | 16 | 13 | 10 | > | 11.0 | 15 | 13 | 9 | > | 12.0 | 14 | 12 | 8 | > | 13.0 | 13 | 12 | 8 | > | 14.0 | 12 | 11 | 7 | > | 15.0 | 11 | 11 | 7 | > | 16.0 | 11 | 11 | 6 | > | 17.0 | 10 | 10 | 6 | > | 18.0 | 10 | 10 | 6 | > | 19.0 | 9 | 10 | 5 | > | 20.0 | 9 | 10 | 5 | > > When the download speed is lower than 3.5 MB/s, Lzip is better, and > above that speed Zstd is better. > > As Gzip is never the best choice, it would make sense to drop it, even > if we have to wait a little until everyone has updated their Guix daemon > to a version with at least Lzip support. Right. We can drop it eventually, maybe soon since only 1% of our downloads pick gzip. > I think there are many people (like me) with a download speed slower > than 3 MB/s, so like Pierre I would prefer keeping "lzip -9" and > "zstd -19". Understood. To me, that means we need to implement something smart. Ludo’. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: When substitute download + decompression is CPU-bound 2021-01-28 17:58 ` Ludovic Courtès 2021-01-29 9:45 ` Pierre Neidhardt @ 2021-01-29 13:33 ` zimoun 1 sibling, 0 replies; 43+ messages in thread From: zimoun @ 2021-01-29 13:33 UTC (permalink / raw) To: Ludovic Courtès; +Cc: Guix Devel Hi, On Thu, 28 Jan 2021 at 19:07, Ludovic Courtès <ludo@gnu.org> wrote: > We could do that. I suppose a possible agenda would be: > > 1. Start providing zstd susbstitutes anytime. However, most clients > will keep choosing lzip because it usually compresses better. > > 2. After the next release, stop providing lzip substitutes and provide > only gzip + zstd-19. > > This option has the advantage that it wouldn’t break any installation. > It’s not as nice as the ability to choose a download strategy, as we > discussed earlier, but implementing that download strategy sounds > tricky. I propose to announce at the next release (v1.3) that strategy X will be dropped at the next next release (v1.4), explaining the daemon upgrade and/or point to documentation. From my understanding (thanks Guillaume for the plots!), X means gzip. And we should keep lzip-9 (users with a weak network) et zstd-19, as Pierre and Guillaume are proposing. So gzip would stay until v1.4, i.e., more or less 1 year (or 1.5 years) more. All the best, simon ^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2021-03-20 11:24 UTC | newest] Thread overview: 43+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-12-14 22:20 When substitute download + decompression is CPU-bound Ludovic Courtès 2020-12-14 22:29 ` Julien Lepiller 2020-12-14 22:59 ` Nicolò Balzarotti 2020-12-15 7:52 ` Pierre Neidhardt 2020-12-15 9:45 ` Nicolò Balzarotti 2020-12-15 9:54 ` Pierre Neidhardt 2020-12-15 10:03 ` Nicolò Balzarotti 2020-12-15 10:13 ` Pierre Neidhardt 2020-12-15 10:14 ` Pierre Neidhardt 2020-12-15 11:42 ` Ludovic Courtès 2020-12-15 12:31 ` Pierre Neidhardt 2020-12-18 14:59 ` Ludovic Courtès 2020-12-18 15:33 ` Pierre Neidhardt 2020-12-15 11:36 ` Ludovic Courtès 2020-12-15 11:45 ` Nicolò Balzarotti 2020-12-15 10:40 ` Jonathan Brielmaier 2020-12-15 19:43 ` Joshua Branson 2021-01-07 10:45 ` Guillaume Le Vaillant 2021-01-07 11:00 ` Pierre Neidhardt 2021-01-07 11:33 ` Guillaume Le Vaillant 2021-01-14 21:51 ` Ludovic Courtès 2021-01-14 22:08 ` Nicolò Balzarotti 2021-01-28 17:53 ` Are gzip-compressed substitutes still used? Ludovic Courtès 2021-03-17 17:12 ` Ludovic Courtès 2021-03-17 17:33 ` Léo Le Bouter 2021-03-17 18:08 ` Vagrant Cascadian 2021-03-18 0:03 ` zimoun 2021-03-18 16:00 ` Vagrant Cascadian 2021-03-18 18:53 ` Leo Famulari 2021-03-20 11:23 ` Ludovic Courtès 2021-03-17 18:06 ` zimoun 2021-03-17 18:20 ` Jonathan Brielmaier 2021-03-18 17:25 ` Pierre Neidhardt 2021-01-15 8:10 ` When substitute download + decompression is CPU-bound Pierre Neidhardt 2021-01-28 17:58 ` Ludovic Courtès 2021-01-29 9:45 ` Pierre Neidhardt 2021-01-29 11:23 ` Guillaume Le Vaillant 2021-01-29 11:55 ` Nicolò Balzarotti 2021-01-29 12:13 ` Pierre Neidhardt 2021-01-29 13:06 ` Guillaume Le Vaillant 2021-01-29 14:55 ` Nicolò Balzarotti 2021-02-01 22:18 ` Ludovic Courtès 2021-01-29 13:33 ` zimoun
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/guix.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).