From mboxrd@z Thu Jan  1 00:00:00 1970
From: Mark H Weaver <mhw@netris.org>
Subject: bug#26201: hydra.gnu.org uses =?UTF-8?Q?=E2=80=98guix_?=
	=?UTF-8?Q?publish=E2=80=99?= for nars and narinfos
Date: Fri, 24 Mar 2017 04:12:50 -0400
Message-ID: <87y3vvozy5.fsf@netris.org>
References: <20170320184449.5ac06051@khaalida>
	<144e9ba8-af93-fb18-d2b9-f198ae7c11e9@tobias.gr>
	<20170320195247.05f72fc9@khaalida>
	<8e7e07d1-563f-666f-2c32-2a772757c86f@tobias.gr>
	<8760j2wpfy.fsf@gnu.org>
	<9889a4b5-c300-cd03-1095-1115428067fb@tobias.gr>
	<87r31pyms2.fsf_-_@gnu.org> <87inmzrgbf.fsf@netris.org>
	<25b2472a-c705-53fe-f94f-04de9a2d484e@tobias.gr>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Return-path: <bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org>
Received: from eggs.gnu.org ([2001:4830:134:3::10]:54198)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1crKMH-00071y-M1
	for bug-guix@gnu.org; Fri, 24 Mar 2017 04:14:06 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1crKME-0004iJ-FW
	for bug-guix@gnu.org; Fri, 24 Mar 2017 04:14:05 -0400
Received: from debbugs.gnu.org ([208.118.235.43]:43502)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <Debian-debbugs@debbugs.gnu.org>)
	id 1crKME-0004iF-CP
	for bug-guix@gnu.org; Fri, 24 Mar 2017 04:14:02 -0400
Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2)
	(envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1crKME-0001R6-58
	for bug-guix@gnu.org; Fri, 24 Mar 2017 04:14:02 -0400
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
Resent-Message-ID: <handler.26201.B26201.14903431915441@debbugs.gnu.org>
In-Reply-To: <25b2472a-c705-53fe-f94f-04de9a2d484e@tobias.gr> (Tobias
	Geerinckx-Rice's message of "Thu, 23 Mar 2017 19:52:30 +0100")
List-Id: Bug reports for GNU Guix <bug-guix.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/bug-guix>,
	<mailto:bug-guix-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/bug-guix/>
List-Post: <mailto:bug-guix@gnu.org>
List-Help: <mailto:bug-guix-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/bug-guix>,
	<mailto:bug-guix-request@gnu.org?subject=subscribe>
Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org
Sender: "bug-Guix" <bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org>
To: Tobias Geerinckx-Rice <me@tobias.gr>
Cc: 26201@debbugs.gnu.org, guix-sysadmin@gnu.org

Hi,

Tobias Geerinckx-Rice <me@tobias.gr> writes:

> On 23/03/17 19:36, Mark H Weaver wrote:
>> One question: what will happen in the case of multiple concurrent
>> requests for the same nar?  Will multiple nar-pack-and-bzip2 processes
>> be run on-demand?
>
> I think this used to be the case with the previous nginx configuration,
> but the recent changes pushed by Ludo' were aimed in part at preventing
> that.
>
>> Recall that the nginx proxy will pass all of those requests through,
>
> Are you sure? I was under the impression=C2=B9 that this is exactly what
> =E2=80=98proxy_cache_lock on;=E2=80=99 prevents. I'm no nginx guru, obvio=
usly, so please
> =E2=80=94 anyone! =E2=80=94 correct me if I'm misguided.

I agree that "proxy_cache_lock on" should prevent multiple concurrent
requests for the same URL, but unfortunately its behavior is quite
undesirable, and arguably worse than leaving it off in our case.  See:

  https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_lock

Specifically:

  Other requests of the same cache element will either wait for a
  response to appear in the cache or the cache lock for this element to
  be released, up to the time set by the proxy_cache_lock_timeout
  directive.

In our problem case, it takes more than an hour for Hydra to finish
sending a response for the 'texlive-texmf' nar.  During that time, the
nar will be slowly sent to the first client while it's being packed and
bzipped on-demand.

IIUC, with "proxy_cache_lock on", we have two choices of how other
client requests will be treated:

(1) If we increase "proxy_cache_lock_timeout" to a huge value, then
    there will *no* data sent to the other clients until the first
    client has received the entire nar, which means they wait over an
    hour before receiving the first byte.  I guess this will result in
    timeouts on the client side.

(2) If "proxy_cache_lock_timeout" is *not* huge, then all other clients
    will get failure responses until the first client has received the
    entire nar.

Either way, this would cause users to see the same download failures
(requiring user work-arounds like --fallback) that this fix is intended
to prevent for 'texlive-texmf', but instead of happening only for that
one nar, it will now happen for *all* large nars.

Or at least that's what I'd expect based on my reading of the nginx docs
linked above.  I haven't tried it.

IMO, the best solution is to *never* generate nars on Hydra in response
to client requests, but rather to have the build slaves pack and
compress the nars, copy them to Hydra, and then serve them as static
files using nginx.

A far inferior solution, but possibly acceptable and closer to the
current approach, would be to arrange for all concurrent responses for
the same nar to be sent incrementally from a single nar-packing process.
More concretely, while packing and sending a nar response to the first
client, the data would also be written to a file.  Subsequent requests
for the same nar would be serviced using the equivalent of:

  tail --bytes=3D+0 --follow FILENAME

This way, no one would have to wait an hour to receive the first byte.

What do you think?

      Mark