From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark H Weaver Subject: Re: Suggestion: disable offloading for texlive builds on hydra? Date: Sun, 26 Oct 2014 12:07:13 -0400 Message-ID: <87a94irf0u.fsf@yeeloong.lan> References: <87ppdf1dwc.fsf@netris.org> <20141026074926.GA3937@intra> <877fzmncmf.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:45733) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XiQMh-0005cp-L6 for guix-devel@gnu.org; Sun, 26 Oct 2014 12:08:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XiQMc-00033E-8b for guix-devel@gnu.org; Sun, 26 Oct 2014 12:08:23 -0400 In-Reply-To: <877fzmncmf.fsf@gnu.org> ("Ludovic \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\= \=\?utf-8\?Q\?s\?\= message of "Sun, 26 Oct 2014 15:12:40 +0100") List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org To: Ludovic =?utf-8?Q?Court=C3=A8s?= Cc: guix-devel@gnu.org ludo@gnu.org (Ludovic Court=C3=A8s) writes: > John Darrington skribis: > >> On Sun, Oct 26, 2014 at 03:36:03AM -0400, Mark H Weaver wrote: >> When texlive is built on hydra, the build slave that built it is ti= ed up >> for 12 hours or more waiting for the build outputs (over 3 gigabyte= s!) >> to be transferred back to hydra. >>=20=20=20=20=20=20 >> By design, only one transfer can happen at a time from a given build >> slave, so during those 12 hours, the build slave's CPU is left idle= , and >> typically another 3 built-but-not-yet-transferred packages must wait >> until the texlive transfer finishes. >> >> Why is it designed like that? It seems like a poor design to me. > > The rationale was that, in general, you just slow everything down by > sending several things at once. I have my doubts that it would slow things down very much, if at all. The number of parallel transfers would still be limited to a small number, typically 4 per build slave. The expense associated with running multiple processes on a CPU is mainly due to cache effects, but I wouldn't expect that to be an issue with network connections, especially when those connections are between the same two hosts. The practice of using multiple connections is well established in web browsers and imap clients, as long as the number is not too large. We're losing a huge amount of available CPU capacity in our build farm (probably over 30 machine-hours per texinfo rebuild) in exchange for a dubious increase in network efficiency. The more I think about it, the more I agree with John that we've chosen the wrong tradeoff here. I think we should remove those mutexes. > diff --git a/gnu/packages/texlive.scm b/gnu/packages/texlive.scm > index e562b02..bc0ece7 100644 > --- a/gnu/packages/texlive.scm > +++ b/gnu/packages/texlive.scm > @@ -88,7 +88,7 @@ > ("pkg-config" ,pkg-config) > ("python" ,python-2) ; incompatible with Python 3 (print syntax) > ("tcsh" ,tcsh))) > - (outputs '("out" "data")) > + (outputs '("out" "data" "doc")) > (arguments > `(#:out-of-source? #t > #:configure-flags > > > Data point: there=E2=80=99s 1.6 GiB in texmf-dist/doc (which the patch ab= ove > splits out), and 1.4 GiB in texmf-dist/fonts. I'd definitely be in favor of splitting out the docs. > Another option Andreas and I discussed a while back would be to use a > fixed-output derivation for the data, since it=E2=80=99s really what it i= s. > That=E2=80=99s a bit hacky though: we=E2=80=99d have to install it, compu= te the hash of > the installed files, and then use that as the derivation=E2=80=99s output= hash. Hmm. It is indeed a hack, but maybe worth considering. When I think about Guix users downloading over 3 GiB from our humble hydra quite often just to have TeX, it makes me worry about our bandwidth requirements. Thanks, Mark