unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
From: ludo@gnu.org (Ludovic Courtès)
To: Mark H Weaver <mhw@netris.org>
Cc: 26201@debbugs.gnu.org, guix-sysadmin@gnu.org
Subject: bug#26201: hydra.gnu.org uses ‘guix publish’ for nars and narinfos
Date: Fri, 24 Mar 2017 10:25:35 +0100	[thread overview]
Message-ID: <87d1d710xc.fsf@gnu.org> (raw)
In-Reply-To: <87y3vvozy5.fsf@netris.org> (Mark H. Weaver's message of "Fri, 24 Mar 2017 04:12:50 -0400")

Hi!

Mark H Weaver <mhw@netris.org> skribis:

> Tobias Geerinckx-Rice <me@tobias.gr> writes:

[...]

>> Are you sure? I was under the impression¹ that this is exactly what
>> ‘proxy_cache_lock on;’ prevents. I'm no nginx guru, obviously, so please
>> — anyone! — correct me if I'm misguided.
>
> I agree that "proxy_cache_lock on" should prevent multiple concurrent
> requests for the same URL, but unfortunately its behavior is quite
> undesirable, and arguably worse than leaving it off in our case.  See:
>
>   https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_lock
>
> Specifically:
>
>   Other requests of the same cache element will either wait for a
>   response to appear in the cache or the cache lock for this element to
>   be released, up to the time set by the proxy_cache_lock_timeout
>   directive.
>
> In our problem case, it takes more than an hour for Hydra to finish
> sending a response for the 'texlive-texmf' nar.  During that time, the
> nar will be slowly sent to the first client while it's being packed and
> bzipped on-demand.
>
> IIUC, with "proxy_cache_lock on", we have two choices of how other
> client requests will be treated:
>
> (1) If we increase "proxy_cache_lock_timeout" to a huge value, then
>     there will *no* data sent to the other clients until the first
>     client has received the entire nar, which means they wait over an
>     hour before receiving the first byte.  I guess this will result in
>     timeouts on the client side.
>
> (2) If "proxy_cache_lock_timeout" is *not* huge, then all other clients
>     will get failure responses until the first client has received the
>     entire nar.
>
> Either way, this would cause users to see the same download failures
> (requiring user work-arounds like --fallback) that this fix is intended
> to prevent for 'texlive-texmf', but instead of happening only for that
> one nar, it will now happen for *all* large nars.

My understanding is that proxy_cache_lock allows us to avoid spawning
concurrent compression threads of the same item at the same time, while
also avoiding starvation (proxy_cache_lock_timeout should ensure that
nobody ends up waiting until the nar-compression process is done.)

IOW, it should help reduce load in most cases, while introducing small
delays in some cases (if you’re downloading a nar that’s already being
downloaded.)

> IMO, the best solution is to *never* generate nars on Hydra in response
> to client requests, but rather to have the build slaves pack and
> compress the nars, copy them to Hydra, and then serve them as static
> files using nginx.

The problem is that we want nars to be signed by the master node.  Or,
if we don’t require that, we need a PKI that allows us to express the
fact that hydra.gnu.org delegates to the build machines.

> A far inferior solution, but possibly acceptable and closer to the
> current approach, would be to arrange for all concurrent responses for
> the same nar to be sent incrementally from a single nar-packing process.
> More concretely, while packing and sending a nar response to the first
> client, the data would also be written to a file.  Subsequent requests
> for the same nar would be serviced using the equivalent of:
>
>   tail --bytes=+0 --follow FILENAME
>
> This way, no one would have to wait an hour to receive the first byte.

Yes.  I would think that NGINX does something like that for its caching,
but I don’t know exactly when/how.

Other solutions I’ve thought about:

  1. Produce narinfos and nars periodically rather than on-demand and
     serve them as static files.

     pros: better HTTP latency and bandwidth
     pros: allows us to add a Content-Length for nars
     cons: doesn’t reduce load on hydra.gnu.org
     cons: introduces arbitrary delays in delivering nars
     cons: difficult/expensive to know what new store items are available

  2. Produce a narinfo and corresponding nar the first time they are
     requested.  So, the first time we receive “GET foo.narinfo”, return
     404 and spawn a thread to compute foo.narinfo and foo.nar.  Return
     200 only when both are ready.

     The precomputed nar{,info}s would be kept in a cache and we could
     make sure a narinfo and its nar have the same lifetime, which
     addresses one of the problems we have.

     pros: better HTTP latency and bandwidth
     pros: allows us to add a Content-Length for nars
     pros: helps keep narinfo/nar lifetime in sync
     cons: doesn’t reduce load on hydra.gnu.org
     cons: exposes inconsistency between the store contents and the HTTP
           response (you may get 404 even if the thing is actually in
           store), but maybe that’s not a problem

Thoughts?

Ludo’.

  reply	other threads:[~2017-03-24  9:26 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-21  1:44 bug#26201: No notification of cache misses when downloading substitutes dian_cecht
2017-03-21  2:46 ` Tobias Geerinckx-Rice
2017-03-21  2:52   ` dian_cecht
2017-03-21  3:57     ` Tobias Geerinckx-Rice
2017-03-21  4:48       ` dian_cecht
2017-03-21  6:21         ` Tobias Geerinckx-Rice
2017-03-21  6:49           ` dian_cecht
2017-03-21 14:55             ` Tobias Geerinckx-Rice
2017-03-21 15:32               ` dian_cecht
2017-03-21 16:07                 ` Tobias Geerinckx-Rice
2017-03-24  2:15                   ` Maxim Cournoyer
2017-03-21 12:59         ` Florian Pelz
2017-03-21 15:35           ` dian_cecht
2017-03-21 16:43       ` Ludovic Courtès
2017-03-21 17:08         ` Tobias Geerinckx-Rice
2017-03-22 22:06           ` Ludovic Courtès
2017-03-23 19:25             ` bug#26201: hydra.gnu.org uses ‘guix publish’ for nars and narinfos Tobias Geerinckx-Rice
2017-03-22 22:22           ` Ludovic Courtès
2017-03-23 10:29             ` Ricardo Wurmus
2017-03-23 18:36             ` Mark H Weaver
2017-03-23 18:52               ` Tobias Geerinckx-Rice
2017-03-24  8:12                 ` Mark H Weaver
2017-03-24  9:25                   ` Ludovic Courtès [this message]
2017-04-17 21:36                     ` Ludovic Courtès
2017-04-18 21:27                     ` Ludovic Courtès
2017-04-19 14:24                       ` bug#26201: Heads-up: hydra.gnu.org uses ‘guix publish --cache’ Ludovic Courtès
2017-03-26 17:35                   ` bug#26201: hydra.gnu.org uses ‘guix publish’ for nars and narinfos Tobias Geerinckx-Rice
2017-03-27 18:47                     ` Tobias Geerinckx-Rice
2017-03-28 14:47                       ` Ludovic Courtès
2017-05-03  8:11                     ` Mark H Weaver
2017-05-03  9:25                       ` Ludovic Courtès
2017-03-27 11:20             ` bug#26201: Bandwidth when retrieving substitutes Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87d1d710xc.fsf@gnu.org \
    --to=ludo@gnu.org \
    --cc=26201@debbugs.gnu.org \
    --cc=guix-sysadmin@gnu.org \
    --cc=mhw@netris.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).