* Parallel (de-)compression
@ 2015-12-02 18:45 Andreas Enge
2015-12-02 18:52 ` Andreas Enge
2015-12-04 14:44 ` Ludovic Courtès
0 siblings, 2 replies; 6+ messages in thread
From: Andreas Enge @ 2015-12-02 18:45 UTC (permalink / raw)
To: guix-devel
Hello,
on my relatively slow ARM build machine with relatively fast storage (SSD),
I notice that often there is an xz process taking 100% of CPU, while there
is never more than 20MB/s written to disk. For instance, texlive-texmf
takes a very long time to build and install into the store.
Would it make sense to switch to a parallel (de-)compression tool to leverage
higher numbers of cores? We have pbzip2 already in Guix, which is compatible
with bzip2.
As a negative point, we would increase the size of our packages and also the
bandwidth requirement. So maybe this is not worth it, since we could also
build more packages in parallel. Or are there parallel implementations of xz?
Andreas
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Parallel (de-)compression
2015-12-02 18:45 Parallel (de-)compression Andreas Enge
@ 2015-12-02 18:52 ` Andreas Enge
2015-12-02 19:18 ` Efraim Flashner
2015-12-04 14:44 ` Ludovic Courtès
1 sibling, 1 reply; 6+ messages in thread
From: Andreas Enge @ 2015-12-02 18:52 UTC (permalink / raw)
To: guix-devel
How about this:
http://anthon.home.xs4all.nl/rants/2013/parallel_xz/
?
Andreas
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Parallel (de-)compression
2015-12-02 18:52 ` Andreas Enge
@ 2015-12-02 19:18 ` Efraim Flashner
0 siblings, 0 replies; 6+ messages in thread
From: Efraim Flashner @ 2015-12-02 19:18 UTC (permalink / raw)
To: Andreas Enge; +Cc: guix-devel
[-- Attachment #1: Type: text/plain, Size: 1076 bytes --]
On Wed, 2 Dec 2015 19:52:17 +0100
Andreas Enge <andreas@enge.fr> wrote:
> How about this:
> http://anthon.home.xs4all.nl/rants/2013/parallel_xz/
> ?
>
> Andreas
>
>
From what I've read, parallel bzip breaks everything into chunks and then
reassembles everything into a .tar.bz2 file. On one of my mail folders
containing ~3000 pieces of mail ~1GiB, my 2 core laptop took 10 minutes with
bzip and 7 minutes with pbzip to compress the mail, with the
resulting .tar.bz2 files being ~100KiB difference in size. I didn't get a
chance to test decompression because my battery died :). One of the big
plusses to pbzip is that the compressed files are fully compatable with bzip,
and I believe all the normally used flags are also usable. It can also be
dropped into tar with --use-compress-prog=pbzip2.
Parallel xz I know much less about.
--
Efraim Flashner <efraim@flashner.co.il> אפרים פלשנר
GPG key = A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Parallel (de-)compression
2015-12-02 18:45 Parallel (de-)compression Andreas Enge
2015-12-02 18:52 ` Andreas Enge
@ 2015-12-04 14:44 ` Ludovic Courtès
2015-12-06 15:31 ` Andreas Enge
1 sibling, 1 reply; 6+ messages in thread
From: Ludovic Courtès @ 2015-12-04 14:44 UTC (permalink / raw)
To: Andreas Enge; +Cc: guix-devel
Andreas Enge <andreas@enge.fr> skribis:
> on my relatively slow ARM build machine with relatively fast storage (SSD),
> I notice that often there is an xz process taking 100% of CPU, while there
> is never more than 20MB/s written to disk. For instance, texlive-texmf
> takes a very long time to build and install into the store.
Are you saying that xz-compressing TeX Live to resend it to
hydra.gnu.org is too CPU-intensive?
> Would it make sense to switch to a parallel (de-)compression tool to leverage
> higher numbers of cores? We have pbzip2 already in Guix, which is compatible
> with bzip2.
Bzip2 provides a CPU/compression ratio tradeoff that is not as good as
xz, so I would avoid it.
Another option would be to trade compression ratio for reduced CPU usage
by using, say, ‘xz -2’ or ‘gzip’.
We did something similar in 5ef9d7d to reduce CPU consumption on the
front-end. Usually it’s much less important to reduce CPU consumption
on the build machines, but your experience seems to suggest otherwise.
Thoughts?
Ludo’.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Parallel (de-)compression
2015-12-04 14:44 ` Ludovic Courtès
@ 2015-12-06 15:31 ` Andreas Enge
2015-12-06 22:21 ` Ludovic Courtès
0 siblings, 1 reply; 6+ messages in thread
From: Andreas Enge @ 2015-12-06 15:31 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: guix-devel
On Fri, Dec 04, 2015 at 03:44:38PM +0100, Ludovic Courtès wrote:
> Are you saying that xz-compressing TeX Live to resend it to
> hydra.gnu.org is too CPU-intensive?
That depends on your definition of "too". In any case, on the Novena board
with an SSD attached, CPU is the limiting factor during this phase,
pushing the CPU load on one core to 100% (while the other cores are idle).
> Another option would be to trade compression ratio for reduced CPU usage
> by using, say, ‘xz -2’ or ‘gzip’.
> We did something similar in 5ef9d7d to reduce CPU consumption on the
> front-end. Usually it’s much less important to reduce CPU consumption
> on the build machines, but your experience seems to suggest otherwise.
If possible, it would be more interesting to leverage the several cores
and not make sacrifices on the compression quality. Note that I also did
not measure the different timings: Compression is, I think, done separately
from sending the compressed file; then it is entirely possible that the
data transfer takes longer than the compression.
Andreas
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Parallel (de-)compression
2015-12-06 15:31 ` Andreas Enge
@ 2015-12-06 22:21 ` Ludovic Courtès
0 siblings, 0 replies; 6+ messages in thread
From: Ludovic Courtès @ 2015-12-06 22:21 UTC (permalink / raw)
To: Andreas Enge; +Cc: guix-devel
Andreas Enge <andreas@enge.fr> skribis:
> On Fri, Dec 04, 2015 at 03:44:38PM +0100, Ludovic Courtès wrote:
>> Are you saying that xz-compressing TeX Live to resend it to
>> hydra.gnu.org is too CPU-intensive?
>
> That depends on your definition of "too". In any case, on the Novena board
> with an SSD attached, CPU is the limiting factor during this phase,
> pushing the CPU load on one core to 100% (while the other cores are idle).
OK.
>> Another option would be to trade compression ratio for reduced CPU usage
>> by using, say, ‘xz -2’ or ‘gzip’.
>> We did something similar in 5ef9d7d to reduce CPU consumption on the
>> front-end. Usually it’s much less important to reduce CPU consumption
>> on the build machines, but your experience seems to suggest otherwise.
>
> If possible, it would be more interesting to leverage the several cores
> and not make sacrifices on the compression quality.
It’s not necessarily the best option to increase throughput: the build
machine may be busy building other things, and thus unable to dedicate
all its cores to compression.
Dunno.
Ludo’.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-12-06 22:21 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-02 18:45 Parallel (de-)compression Andreas Enge
2015-12-02 18:52 ` Andreas Enge
2015-12-02 19:18 ` Efraim Flashner
2015-12-04 14:44 ` Ludovic Courtès
2015-12-06 15:31 ` Andreas Enge
2015-12-06 22:21 ` Ludovic Courtès
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).