unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Suggestion: disable offloading for texlive builds on hydra?
@ 2014-10-26  7:36 Mark H Weaver
  2014-10-26  7:49 ` John Darrington
  0 siblings, 1 reply; 10+ messages in thread
From: Mark H Weaver @ 2014-10-26  7:36 UTC (permalink / raw)
  To: guix-devel

When texlive is built on hydra, the build slave that built it is tied up
for 12 hours or more waiting for the build outputs (over 3 gigabytes!)
to be transferred back to hydra.

By design, only one transfer can happen at a time from a given build
slave, so during those 12 hours, the build slave's CPU is left idle, and
typically another 3 built-but-not-yet-transferred packages must wait
until the texlive transfer finishes.

I suggest that we arrange for hydra.gnu.org to build texlive locally for
x86_64 and i686, to avoid this problem.

What do you think?

      Mark

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Suggestion: disable offloading for texlive builds on hydra?
  2014-10-26  7:36 Suggestion: disable offloading for texlive builds on hydra? Mark H Weaver
@ 2014-10-26  7:49 ` John Darrington
  2014-10-26 14:12   ` Ludovic Courtès
  0 siblings, 1 reply; 10+ messages in thread
From: John Darrington @ 2014-10-26  7:49 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1064 bytes --]

On Sun, Oct 26, 2014 at 03:36:03AM -0400, Mark H Weaver wrote:
     When texlive is built on hydra, the build slave that built it is tied up
     for 12 hours or more waiting for the build outputs (over 3 gigabytes!)
     to be transferred back to hydra.
     
     By design, only one transfer can happen at a time from a given build
     slave, so during those 12 hours, the build slave's CPU is left idle, and
     typically another 3 built-but-not-yet-transferred packages must wait
     until the texlive transfer finishes.

Why is it designed like that?  It seems like a poor design to me.
     
     I suggest that we arrange for hydra.gnu.org to build texlive locally for
     x86_64 and i686, to avoid this problem.

Would it help if texlive was split into more outputs?  For example, the docs 
take up a lot of space, and not everyone needs them.
     
J'

-- 
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://sks-keyservers.net or any PGP keyserver for public key.


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Suggestion: disable offloading for texlive builds on hydra?
  2014-10-26  7:49 ` John Darrington
@ 2014-10-26 14:12   ` Ludovic Courtès
  2014-10-26 16:07     ` Mark H Weaver
  2014-10-29 12:29     ` Andreas Enge
  0 siblings, 2 replies; 10+ messages in thread
From: Ludovic Courtès @ 2014-10-26 14:12 UTC (permalink / raw)
  To: John Darrington; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1776 bytes --]

John Darrington <john@darrington.wattle.id.au> skribis:

> On Sun, Oct 26, 2014 at 03:36:03AM -0400, Mark H Weaver wrote:
>      When texlive is built on hydra, the build slave that built it is tied up
>      for 12 hours or more waiting for the build outputs (over 3 gigabytes!)
>      to be transferred back to hydra.
>      
>      By design, only one transfer can happen at a time from a given build
>      slave, so during those 12 hours, the build slave's CPU is left idle, and
>      typically another 3 built-but-not-yet-transferred packages must wait
>      until the texlive transfer finishes.
>
> Why is it designed like that?  It seems like a poor design to me.

The rationale was that, in general, you just slow everything down by
sending several things at once.  TeX Live is a pathological case in that
respect.

As for disabling offloading, see my reply to Federico: we could expose
#:local-build? to gnu-build-system, and use that here, but at the moment
that also disables substitutes.  WDYT?

>      I suggest that we arrange for hydra.gnu.org to build texlive locally for
>      x86_64 and i686, to avoid this problem.
>
> Would it help if texlive was split into more outputs?  For example, the docs 
> take up a lot of space, and not everyone needs them.

I think it may help a bit, at least by leaving a small window during
which other builds could get started.  And it would also be more
convenient for users, who could choose whether to install the whole
thing or not.

When you mentioned it some time ago on IRC, I gave it a try, but then
failed to actually test the patch due to...  ENOSPC.  :-)

Anyway, here’s the patch I had.  I’d be happy if you or someone else
could just confirm it works as expected:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 439 bytes --]

diff --git a/gnu/packages/texlive.scm b/gnu/packages/texlive.scm
index e562b02..bc0ece7 100644
--- a/gnu/packages/texlive.scm
+++ b/gnu/packages/texlive.scm
@@ -88,7 +88,7 @@
       ("pkg-config" ,pkg-config)
       ("python" ,python-2) ; incompatible with Python 3 (print syntax)
       ("tcsh" ,tcsh)))
-   (outputs '("out" "data"))
+   (outputs '("out" "data" "doc"))
    (arguments
     `(#:out-of-source? #t
       #:configure-flags


[-- Attachment #3: Type: text/plain, Size: 438 bytes --]


Data point: there’s 1.6 GiB in texmf-dist/doc (which the patch above
splits out), and 1.4 GiB in texmf-dist/fonts.

Another option Andreas and I discussed a while back would be to use a
fixed-output derivation for the data, since it’s really what it is.
That’s a bit hacky though: we’d have to install it, compute the hash of
the installed files, and then use that as the derivation’s output hash.

Thanks,
Ludo’.

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Suggestion: disable offloading for texlive builds on hydra?
  2014-10-26 14:12   ` Ludovic Courtès
@ 2014-10-26 16:07     ` Mark H Weaver
  2014-10-27 12:58       ` Ludovic Courtès
                         ` (2 more replies)
  2014-10-29 12:29     ` Andreas Enge
  1 sibling, 3 replies; 10+ messages in thread
From: Mark H Weaver @ 2014-10-26 16:07 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

ludo@gnu.org (Ludovic Courtès) writes:

> John Darrington <john@darrington.wattle.id.au> skribis:
>
>> On Sun, Oct 26, 2014 at 03:36:03AM -0400, Mark H Weaver wrote:
>>      When texlive is built on hydra, the build slave that built it is tied up
>>      for 12 hours or more waiting for the build outputs (over 3 gigabytes!)
>>      to be transferred back to hydra.
>>      
>>      By design, only one transfer can happen at a time from a given build
>>      slave, so during those 12 hours, the build slave's CPU is left idle, and
>>      typically another 3 built-but-not-yet-transferred packages must wait
>>      until the texlive transfer finishes.
>>
>> Why is it designed like that?  It seems like a poor design to me.
>
> The rationale was that, in general, you just slow everything down by
> sending several things at once.

I have my doubts that it would slow things down very much, if at all.
The number of parallel transfers would still be limited to a small
number, typically 4 per build slave.  The expense associated with
running multiple processes on a CPU is mainly due to cache effects, but
I wouldn't expect that to be an issue with network connections,
especially when those connections are between the same two hosts.  The
practice of using multiple connections is well established in web
browsers and imap clients, as long as the number is not too large.

We're losing a huge amount of available CPU capacity in our build farm
(probably over 30 machine-hours per texinfo rebuild) in exchange for a
dubious increase in network efficiency.

The more I think about it, the more I agree with John that we've chosen
the wrong tradeoff here.  I think we should remove those mutexes.

> diff --git a/gnu/packages/texlive.scm b/gnu/packages/texlive.scm
> index e562b02..bc0ece7 100644
> --- a/gnu/packages/texlive.scm
> +++ b/gnu/packages/texlive.scm
> @@ -88,7 +88,7 @@
>        ("pkg-config" ,pkg-config)
>        ("python" ,python-2) ; incompatible with Python 3 (print syntax)
>        ("tcsh" ,tcsh)))
> -   (outputs '("out" "data"))
> +   (outputs '("out" "data" "doc"))
>     (arguments
>      `(#:out-of-source? #t
>        #:configure-flags
>
>
> Data point: there’s 1.6 GiB in texmf-dist/doc (which the patch above
> splits out), and 1.4 GiB in texmf-dist/fonts.

I'd definitely be in favor of splitting out the docs.

> Another option Andreas and I discussed a while back would be to use a
> fixed-output derivation for the data, since it’s really what it is.
> That’s a bit hacky though: we’d have to install it, compute the hash of
> the installed files, and then use that as the derivation’s output hash.

Hmm.  It is indeed a hack, but maybe worth considering.  When I think
about Guix users downloading over 3 GiB from our humble hydra quite
often just to have TeX, it makes me worry about our bandwidth
requirements.

    Thanks,
      Mark

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Suggestion: disable offloading for texlive builds on hydra?
  2014-10-26 16:07     ` Mark H Weaver
@ 2014-10-27 12:58       ` Ludovic Courtès
  2014-10-28 23:55       ` Ludovic Courtès
  2014-10-29 21:50       ` Andreas Enge
  2 siblings, 0 replies; 10+ messages in thread
From: Ludovic Courtès @ 2014-10-27 12:58 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guix-devel

Mark H Weaver <mhw@netris.org> skribis:

> ludo@gnu.org (Ludovic Courtès) writes:

[...]

>> The rationale was that, in general, you just slow everything down by
>> sending several things at once.
>
> I have my doubts that it would slow things down very much, if at all.
> The number of parallel transfers would still be limited to a small
> number, typically 4 per build slave.  The expense associated with
> running multiple processes on a CPU is mainly due to cache effects, but
> I wouldn't expect that to be an issue with network connections,
> especially when those connections are between the same two hosts.  The
> practice of using multiple connections is well established in web
> browsers and imap clients, as long as the number is not too large.
>
> We're losing a huge amount of available CPU capacity in our build farm
> (probably over 30 machine-hours per texinfo rebuild) in exchange for a
> dubious increase in network efficiency.
>
> The more I think about it, the more I agree with John that we've chosen
> the wrong tradeoff here.  I think we should remove those mutexes.

Hmm OK.  I’m happy to try that (it’s a two-line change plus deployment.)

I can do it one of the next few days, but I’m happy if you do it.  :-)

>> diff --git a/gnu/packages/texlive.scm b/gnu/packages/texlive.scm
>> index e562b02..bc0ece7 100644
>> --- a/gnu/packages/texlive.scm
>> +++ b/gnu/packages/texlive.scm
>> @@ -88,7 +88,7 @@
>>        ("pkg-config" ,pkg-config)
>>        ("python" ,python-2) ; incompatible with Python 3 (print syntax)
>>        ("tcsh" ,tcsh)))
>> -   (outputs '("out" "data"))
>> +   (outputs '("out" "data" "doc"))
>>     (arguments
>>      `(#:out-of-source? #t
>>        #:configure-flags
>>
>>
>> Data point: there’s 1.6 GiB in texmf-dist/doc (which the patch above
>> splits out), and 1.4 GiB in texmf-dist/fonts.
>
> I'd definitely be in favor of splitting out the docs.

OK, I’ll test it locally and commit if nothing breaks.

>> Another option Andreas and I discussed a while back would be to use a
>> fixed-output derivation for the data, since it’s really what it is.
>> That’s a bit hacky though: we’d have to install it, compute the hash of
>> the installed files, and then use that as the derivation’s output hash.
>
> Hmm.  It is indeed a hack, but maybe worth considering.  When I think
> about Guix users downloading over 3 GiB from our humble hydra quite
> often just to have TeX, it makes me worry about our bandwidth
> requirements.

Agreed.

Ludo’.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Suggestion: disable offloading for texlive builds on hydra?
  2014-10-26 16:07     ` Mark H Weaver
  2014-10-27 12:58       ` Ludovic Courtès
@ 2014-10-28 23:55       ` Ludovic Courtès
  2014-10-29 21:50       ` Andreas Enge
  2 siblings, 0 replies; 10+ messages in thread
From: Ludovic Courtès @ 2014-10-28 23:55 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guix-devel

Mark H Weaver <mhw@netris.org> skribis:

> The more I think about it, the more I agree with John that we've chosen
> the wrong tradeoff here.  I think we should remove those mutexes.

Done in commit 940a8c5, which I’ve just deployed on hydra.gnu.org.

Ludo’.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Suggestion: disable offloading for texlive builds on hydra?
  2014-10-26 14:12   ` Ludovic Courtès
  2014-10-26 16:07     ` Mark H Weaver
@ 2014-10-29 12:29     ` Andreas Enge
  2014-10-29 16:20       ` Andreas Enge
  2014-10-29 22:17       ` Ludovic Courtès
  1 sibling, 2 replies; 10+ messages in thread
From: Andreas Enge @ 2014-10-29 12:29 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

On Sun, Oct 26, 2014 at 03:12:40PM +0100, Ludovic Courtès wrote:
> -   (outputs '("out" "data"))
> +   (outputs '("out" "data" "doc"))

I just tried this, and it does not work:
builder for `/gnu/store/r39sf9gzfdlxb6q2c4zaz18z63mmc8fz-texlive-2014.drv' failed to produce output path `/gnu/store/s756nm0dcw57h64vimq6bz3hzmf4s40p-texlive-2014-doc'

So I think we need to shuffle things around ourselves. I will try to have a
look. A problem is that I need a working texlive, and compiling an additional
one may take too much space (in a leap of confidence, I just deleted my
texlive before trying the splitting of the docs, but I will not do this again;
well, using a usb stick for /tmp helps a bit...).

Apparently, texlive does not honour --docdir=..., although it does not
complain about the option. I asked about it on the texlive mailing list
and will wait for a suggestion.

In any case, it should be easy to just move the documentation directory; but
I will first need to recompile a working texlive to have a closer look at the
directory...

Andreas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Suggestion: disable offloading for texlive builds on hydra?
  2014-10-29 12:29     ` Andreas Enge
@ 2014-10-29 16:20       ` Andreas Enge
  2014-10-29 22:17       ` Ludovic Courtès
  1 sibling, 0 replies; 10+ messages in thread
From: Andreas Enge @ 2014-10-29 16:20 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

On Wed, Oct 29, 2014 at 01:29:01PM +0100, Andreas Enge wrote:
> A problem is that I need a working texlive, and compiling an additional
> one may take too much space.

Actually, now that deduplication works, havings several texlive installations
with the same data is not a problem any more!

Andreas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Suggestion: disable offloading for texlive builds on hydra?
  2014-10-26 16:07     ` Mark H Weaver
  2014-10-27 12:58       ` Ludovic Courtès
  2014-10-28 23:55       ` Ludovic Courtès
@ 2014-10-29 21:50       ` Andreas Enge
  2 siblings, 0 replies; 10+ messages in thread
From: Andreas Enge @ 2014-10-29 21:50 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guix-devel

On Sun, Oct 26, 2014 at 12:07:13PM -0400, Mark H Weaver wrote:
> Hmm.  It is indeed a hack, but maybe worth considering.  When I think
> about Guix users downloading over 3 GiB from our humble hydra quite
> often just to have TeX, it makes me worry about our bandwidth
> requirements.

What do you mean by "just to have Tex"? This is definitely one of the most
important pieces of software I am using. And having all of it including its
documentation with one installation is a big gain over debian, where one
must always be afraid of being on the train and missing this one crucial
latex style file, or not being able to look up all the obscure options of
algorithm2e.sty ;-)

One option would be to have something like "texlive-small", containing only
the binaries and a smallish subset of texmf-dist, excluding the documentation
and most of the fonts. Users could then choose between the small and the
full package.

Andreas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Suggestion: disable offloading for texlive builds on hydra?
  2014-10-29 12:29     ` Andreas Enge
  2014-10-29 16:20       ` Andreas Enge
@ 2014-10-29 22:17       ` Ludovic Courtès
  1 sibling, 0 replies; 10+ messages in thread
From: Ludovic Courtès @ 2014-10-29 22:17 UTC (permalink / raw)
  To: Andreas Enge; +Cc: guix-devel

Andreas Enge <andreas@enge.fr> skribis:

> On Sun, Oct 26, 2014 at 03:12:40PM +0100, Ludovic Courtès wrote:
>> -   (outputs '("out" "data"))
>> +   (outputs '("out" "data" "doc"))
>
> I just tried this, and it does not work:
> builder for `/gnu/store/r39sf9gzfdlxb6q2c4zaz18z63mmc8fz-texlive-2014.drv' failed to produce output path `/gnu/store/s756nm0dcw57h64vimq6bz3hzmf4s40p-texlive-2014-doc'
>
> So I think we need to shuffle things around ourselves. I will try to have a
> look. A problem is that I need a working texlive, and compiling an additional
> one may take too much space (in a leap of confidence, I just deleted my
> texlive before trying the splitting of the docs, but I will not do this again;
> well, using a usb stick for /tmp helps a bit...).

Heh.  :-)

> Apparently, texlive does not honour --docdir=..., although it does not
> complain about the option. I asked about it on the texlive mailing list
> and will wait for a suggestion.

It’s not uncommon to find configure scripts that ignore --docdir et
al. and instead provide their own option.

Thanks for looking into it.

Ludo’.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-10-29 22:17 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-26  7:36 Suggestion: disable offloading for texlive builds on hydra? Mark H Weaver
2014-10-26  7:49 ` John Darrington
2014-10-26 14:12   ` Ludovic Courtès
2014-10-26 16:07     ` Mark H Weaver
2014-10-27 12:58       ` Ludovic Courtès
2014-10-28 23:55       ` Ludovic Courtès
2014-10-29 21:50       ` Andreas Enge
2014-10-29 12:29     ` Andreas Enge
2014-10-29 16:20       ` Andreas Enge
2014-10-29 22:17       ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).