From mboxrd@z Thu Jan 1 00:00:00 1970 From: ludo@gnu.org (Ludovic =?UTF-8?Q?Court=C3=A8s?=) Subject: bug#26201: hydra.gnu.org uses =?UTF-8?Q?=E2=80=98guix_?= =?UTF-8?Q?publish=E2=80=99?= for nars and narinfos Date: Fri, 24 Mar 2017 10:25:35 +0100 Message-ID: <87d1d710xc.fsf@gnu.org> References: <20170320184449.5ac06051@khaalida> <144e9ba8-af93-fb18-d2b9-f198ae7c11e9@tobias.gr> <20170320195247.05f72fc9@khaalida> <8e7e07d1-563f-666f-2c32-2a772757c86f@tobias.gr> <8760j2wpfy.fsf@gnu.org> <9889a4b5-c300-cd03-1095-1115428067fb@tobias.gr> <87r31pyms2.fsf_-_@gnu.org> <87inmzrgbf.fsf@netris.org> <25b2472a-c705-53fe-f94f-04de9a2d484e@tobias.gr> <87y3vvozy5.fsf@netris.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:42312) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1crLTx-0005Fs-Gy for bug-guix@gnu.org; Fri, 24 Mar 2017 05:26:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1crLTu-0005BS-3A for bug-guix@gnu.org; Fri, 24 Mar 2017 05:26:05 -0400 Received: from debbugs.gnu.org ([208.118.235.43]:43529) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1crLTt-0005BN-Q3 for bug-guix@gnu.org; Fri, 24 Mar 2017 05:26:01 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1crLTt-0003CS-KT for bug-guix@gnu.org; Fri, 24 Mar 2017 05:26:01 -0400 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <87y3vvozy5.fsf@netris.org> (Mark H. Weaver's message of "Fri, 24 Mar 2017 04:12:50 -0400") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Mark H Weaver Cc: 26201@debbugs.gnu.org, guix-sysadmin@gnu.org Hi! Mark H Weaver skribis: > Tobias Geerinckx-Rice writes: [...] >> Are you sure? I was under the impression=C2=B9 that this is exactly what >> =E2=80=98proxy_cache_lock on;=E2=80=99 prevents. I'm no nginx guru, obvi= ously, so please >> =E2=80=94 anyone! =E2=80=94 correct me if I'm misguided. > > I agree that "proxy_cache_lock on" should prevent multiple concurrent > requests for the same URL, but unfortunately its behavior is quite > undesirable, and arguably worse than leaving it off in our case. See: > > https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_l= ock > > Specifically: > > Other requests of the same cache element will either wait for a > response to appear in the cache or the cache lock for this element to > be released, up to the time set by the proxy_cache_lock_timeout > directive. > > In our problem case, it takes more than an hour for Hydra to finish > sending a response for the 'texlive-texmf' nar. During that time, the > nar will be slowly sent to the first client while it's being packed and > bzipped on-demand. > > IIUC, with "proxy_cache_lock on", we have two choices of how other > client requests will be treated: > > (1) If we increase "proxy_cache_lock_timeout" to a huge value, then > there will *no* data sent to the other clients until the first > client has received the entire nar, which means they wait over an > hour before receiving the first byte. I guess this will result in > timeouts on the client side. > > (2) If "proxy_cache_lock_timeout" is *not* huge, then all other clients > will get failure responses until the first client has received the > entire nar. > > Either way, this would cause users to see the same download failures > (requiring user work-arounds like --fallback) that this fix is intended > to prevent for 'texlive-texmf', but instead of happening only for that > one nar, it will now happen for *all* large nars. My understanding is that proxy_cache_lock allows us to avoid spawning concurrent compression threads of the same item at the same time, while also avoiding starvation (proxy_cache_lock_timeout should ensure that nobody ends up waiting until the nar-compression process is done.) IOW, it should help reduce load in most cases, while introducing small delays in some cases (if you=E2=80=99re downloading a nar that=E2=80=99s al= ready being downloaded.) > IMO, the best solution is to *never* generate nars on Hydra in response > to client requests, but rather to have the build slaves pack and > compress the nars, copy them to Hydra, and then serve them as static > files using nginx. The problem is that we want nars to be signed by the master node. Or, if we don=E2=80=99t require that, we need a PKI that allows us to express t= he fact that hydra.gnu.org delegates to the build machines. > A far inferior solution, but possibly acceptable and closer to the > current approach, would be to arrange for all concurrent responses for > the same nar to be sent incrementally from a single nar-packing process. > More concretely, while packing and sending a nar response to the first > client, the data would also be written to a file. Subsequent requests > for the same nar would be serviced using the equivalent of: > > tail --bytes=3D+0 --follow FILENAME > > This way, no one would have to wait an hour to receive the first byte. Yes. I would think that NGINX does something like that for its caching, but I don=E2=80=99t know exactly when/how. Other solutions I=E2=80=99ve thought about: 1. Produce narinfos and nars periodically rather than on-demand and serve them as static files. pros: better HTTP latency and bandwidth pros: allows us to add a Content-Length for nars cons: doesn=E2=80=99t reduce load on hydra.gnu.org cons: introduces arbitrary delays in delivering nars cons: difficult/expensive to know what new store items are available 2. Produce a narinfo and corresponding nar the first time they are requested. So, the first time we receive =E2=80=9CGET foo.narinfo=E2= =80=9D, return 404 and spawn a thread to compute foo.narinfo and foo.nar. Return 200 only when both are ready. The precomputed nar{,info}s would be kept in a cache and we could make sure a narinfo and its nar have the same lifetime, which addresses one of the problems we have. pros: better HTTP latency and bandwidth pros: allows us to add a Content-Length for nars pros: helps keep narinfo/nar lifetime in sync cons: doesn=E2=80=99t reduce load on hydra.gnu.org cons: exposes inconsistency between the store contents and the HTTP response (you may get 404 even if the thing is actually in store), but maybe that=E2=80=99s not a problem Thoughts? Ludo=E2=80=99.