From mboxrd@z Thu Jan  1 00:00:00 1970
From: ludo@gnu.org (Ludovic =?UTF-8?Q?Court=C3=A8s?=)
Subject: bug#26201: hydra.gnu.org uses =?UTF-8?Q?=E2=80=98guix_?=
	=?UTF-8?Q?publish=E2=80=99?= for nars and narinfos
Date: Fri, 24 Mar 2017 10:25:35 +0100
Message-ID: <87d1d710xc.fsf@gnu.org>
References: <20170320184449.5ac06051@khaalida>
	<144e9ba8-af93-fb18-d2b9-f198ae7c11e9@tobias.gr>
	<20170320195247.05f72fc9@khaalida>
	<8e7e07d1-563f-666f-2c32-2a772757c86f@tobias.gr>
	<8760j2wpfy.fsf@gnu.org>
	<9889a4b5-c300-cd03-1095-1115428067fb@tobias.gr>
	<87r31pyms2.fsf_-_@gnu.org> <87inmzrgbf.fsf@netris.org>
	<25b2472a-c705-53fe-f94f-04de9a2d484e@tobias.gr>
	<87y3vvozy5.fsf@netris.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Return-path: <bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org>
Received: from eggs.gnu.org ([2001:4830:134:3::10]:42312)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1crLTx-0005Fs-Gy
	for bug-guix@gnu.org; Fri, 24 Mar 2017 05:26:07 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1crLTu-0005BS-3A
	for bug-guix@gnu.org; Fri, 24 Mar 2017 05:26:05 -0400
Received: from debbugs.gnu.org ([208.118.235.43]:43529)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <Debian-debbugs@debbugs.gnu.org>)
	id 1crLTt-0005BN-Q3
	for bug-guix@gnu.org; Fri, 24 Mar 2017 05:26:01 -0400
Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1crLTt-0003CS-KT
	for bug-guix@gnu.org; Fri, 24 Mar 2017 05:26:01 -0400
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
Resent-Message-ID: <handler.26201.B26201.149034755212286@debbugs.gnu.org>
In-Reply-To: <87y3vvozy5.fsf@netris.org> (Mark H. Weaver's message of "Fri, 24
	Mar 2017 04:12:50 -0400")
List-Id: Bug reports for GNU Guix <bug-guix.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/bug-guix>,
	<mailto:bug-guix-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/bug-guix/>
List-Post: <mailto:bug-guix@gnu.org>
List-Help: <mailto:bug-guix-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/bug-guix>,
	<mailto:bug-guix-request@gnu.org?subject=subscribe>
Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org
Sender: "bug-Guix" <bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org>
To: Mark H Weaver <mhw@netris.org>
Cc: 26201@debbugs.gnu.org, guix-sysadmin@gnu.org

Hi!

Mark H Weaver <mhw@netris.org> skribis:

> Tobias Geerinckx-Rice <me@tobias.gr> writes:

[...]

>> Are you sure? I was under the impression=C2=B9 that this is exactly what
>> =E2=80=98proxy_cache_lock on;=E2=80=99 prevents. I'm no nginx guru, obvi=
ously, so please
>> =E2=80=94 anyone! =E2=80=94 correct me if I'm misguided.
>
> I agree that "proxy_cache_lock on" should prevent multiple concurrent
> requests for the same URL, but unfortunately its behavior is quite
> undesirable, and arguably worse than leaving it off in our case.  See:
>
>   https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_l=
ock
>
> Specifically:
>
>   Other requests of the same cache element will either wait for a
>   response to appear in the cache or the cache lock for this element to
>   be released, up to the time set by the proxy_cache_lock_timeout
>   directive.
>
> In our problem case, it takes more than an hour for Hydra to finish
> sending a response for the 'texlive-texmf' nar.  During that time, the
> nar will be slowly sent to the first client while it's being packed and
> bzipped on-demand.
>
> IIUC, with "proxy_cache_lock on", we have two choices of how other
> client requests will be treated:
>
> (1) If we increase "proxy_cache_lock_timeout" to a huge value, then
>     there will *no* data sent to the other clients until the first
>     client has received the entire nar, which means they wait over an
>     hour before receiving the first byte.  I guess this will result in
>     timeouts on the client side.
>
> (2) If "proxy_cache_lock_timeout" is *not* huge, then all other clients
>     will get failure responses until the first client has received the
>     entire nar.
>
> Either way, this would cause users to see the same download failures
> (requiring user work-arounds like --fallback) that this fix is intended
> to prevent for 'texlive-texmf', but instead of happening only for that
> one nar, it will now happen for *all* large nars.

My understanding is that proxy_cache_lock allows us to avoid spawning
concurrent compression threads of the same item at the same time, while
also avoiding starvation (proxy_cache_lock_timeout should ensure that
nobody ends up waiting until the nar-compression process is done.)

IOW, it should help reduce load in most cases, while introducing small
delays in some cases (if you=E2=80=99re downloading a nar that=E2=80=99s al=
ready being
downloaded.)

> IMO, the best solution is to *never* generate nars on Hydra in response
> to client requests, but rather to have the build slaves pack and
> compress the nars, copy them to Hydra, and then serve them as static
> files using nginx.

The problem is that we want nars to be signed by the master node.  Or,
if we don=E2=80=99t require that, we need a PKI that allows us to express t=
he
fact that hydra.gnu.org delegates to the build machines.

> A far inferior solution, but possibly acceptable and closer to the
> current approach, would be to arrange for all concurrent responses for
> the same nar to be sent incrementally from a single nar-packing process.
> More concretely, while packing and sending a nar response to the first
> client, the data would also be written to a file.  Subsequent requests
> for the same nar would be serviced using the equivalent of:
>
>   tail --bytes=3D+0 --follow FILENAME
>
> This way, no one would have to wait an hour to receive the first byte.

Yes.  I would think that NGINX does something like that for its caching,
but I don=E2=80=99t know exactly when/how.

Other solutions I=E2=80=99ve thought about:

  1. Produce narinfos and nars periodically rather than on-demand and
     serve them as static files.

     pros: better HTTP latency and bandwidth
     pros: allows us to add a Content-Length for nars
     cons: doesn=E2=80=99t reduce load on hydra.gnu.org
     cons: introduces arbitrary delays in delivering nars
     cons: difficult/expensive to know what new store items are available

  2. Produce a narinfo and corresponding nar the first time they are
     requested.  So, the first time we receive =E2=80=9CGET foo.narinfo=E2=
=80=9D, return
     404 and spawn a thread to compute foo.narinfo and foo.nar.  Return
     200 only when both are ready.

     The precomputed nar{,info}s would be kept in a cache and we could
     make sure a narinfo and its nar have the same lifetime, which
     addresses one of the problems we have.

     pros: better HTTP latency and bandwidth
     pros: allows us to add a Content-Length for nars
     pros: helps keep narinfo/nar lifetime in sync
     cons: doesn=E2=80=99t reduce load on hydra.gnu.org
     cons: exposes inconsistency between the store contents and the HTTP
           response (you may get 404 even if the thing is actually in
           store), but maybe that=E2=80=99s not a problem

Thoughts?

Ludo=E2=80=99.