unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Narinfo negative and transient error caching
@ 2021-03-05 22:27 Christopher Baines
  2021-04-19 20:55 ` Christopher Baines
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Christopher Baines @ 2021-03-05 22:27 UTC (permalink / raw)
  To: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1304 bytes --]

Hey,

This has been on my mind for a while, as I wonder what effect it has on
users fetching substitues.

The narinfo caching as I understand it works as follows:

 Default success TTL => 36 hours
 Negative TTL        => 1 hour
 Transient error TTL => 10 minutes

I'm ignoring the success TTL, I'm just interested in the negative and
transient error values. Negative means that when a server says it
doesn't have an output, that response will be cached for an
hour. Transient errors are for other HTTP response codes, like 504.

I had a look through the Git history, caching negative lookups has been
a thing for a while. Caching transient errors was added, but I couldn't
see why.

Personally I don't see a reason to keep either behaviours?

In an extreme case, the Guix Build Coordinator has to work hard to work
around this caching. Asking the guix-daemon if a substitute exists is
dangerous, as it literally costs an hour if that substitute isn't
available yet, but will be shortly (which happens all the time when
building a bunch of things). Currently it checks itself, and only
continues to ask the guix-daemon to fetch the item if it knows it to
exist. The transient error caching is also problematic, as that imposes
a 10 minute penalty if there's a server issue.

Any thoughts?

Thanks,

Chris

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Narinfo negative and transient error caching
  2021-03-05 22:27 Narinfo negative and transient error caching Christopher Baines
@ 2021-04-19 20:55 ` Christopher Baines
  2021-04-22 22:11 ` Ludovic Courtès
  2021-04-22 22:14 ` Narinfo negative and transient error caching Ludovic Courtès
  2 siblings, 0 replies; 8+ messages in thread
From: Christopher Baines @ 2021-04-19 20:55 UTC (permalink / raw)
  To: guix-devel

[-- Attachment #1: Type: text/plain, Size: 903 bytes --]


Christopher Baines <mail@cbaines.net> writes:

> This has been on my mind for a while, as I wonder what effect it has on
> users fetching substitues.
>
> The narinfo caching as I understand it works as follows:
>
>  Default success TTL => 36 hours
>  Negative TTL        => 1 hour
>  Transient error TTL => 10 minutes
>
> I'm ignoring the success TTL, I'm just interested in the negative and
> transient error values. Negative means that when a server says it
> doesn't have an output, that response will be cached for an
> hour. Transient errors are for other HTTP response codes, like 504.
>
> I had a look through the Git history, caching negative lookups has been
> a thing for a while. Caching transient errors was added, but I couldn't
> see why.
>
> Personally I don't see a reason to keep either behaviours?

I've now sent a patch to remove this behaviour:

  https://issues.guix.gnu.org/47897

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Narinfo negative and transient error caching
  2021-03-05 22:27 Narinfo negative and transient error caching Christopher Baines
  2021-04-19 20:55 ` Christopher Baines
@ 2021-04-22 22:11 ` Ludovic Courtès
  2021-04-22 23:14   ` Christopher Baines
  2021-04-22 22:14 ` Narinfo negative and transient error caching Ludovic Courtès
  2 siblings, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2021-04-22 22:11 UTC (permalink / raw)
  To: Christopher Baines; +Cc: guix-devel, 47897

Hi!

(“Sorry for the long delay” is officially my motto at this point.)

Christopher Baines <mail@cbaines.net> skribis:

> This has been on my mind for a while, as I wonder what effect it has on
> users fetching substitues.
>
> The narinfo caching as I understand it works as follows:
>
>  Default success TTL => 36 hours
>  Negative TTL        => 1 hour
>  Transient error TTL => 10 minutes
>
> I'm ignoring the success TTL, I'm just interested in the negative and
> transient error values. Negative means that when a server says it
> doesn't have an output, that response will be cached for an
> hour. Transient errors are for other HTTP response codes, like 504.

You’re looking at the default TTLs, which are not the actual TTLs.
Specifically, servers can include a ‘Cache-Control’ header in their
reply specifying the TTL of their choice, and ‘guix substitute’ honors
that:

  https://git.savannah.gnu.org/cgit/guix.git/tree/guix/substitutes.scm#n200
  https://git.savannah.gnu.org/cgit/guix.git/tree/guix/scripts/publish.scm#n371

‘guix publish’ returns 404 with a TTL of 5mn when the requested item is
in store but needs to be “baked”.

However, ‘guix publish’ does not set ‘Cache-Control’ when the request
item is not in store.  In that case, clients use ‘%narinfo-negative-ttl’
(1h).

> I had a look through the Git history, caching negative lookups has been
> a thing for a while. Caching transient errors was added, but I couldn't
> see why.

Transient error caching was most likely added in the days of
hydra.gnu.org, that VM that was extremely slow.  When overloaded, you’d
get 500 or similar, and at that point it was safer for clients to wait
and come back later, possibly much later.  :-)

> Personally I don't see a reason to keep either behaviours?

The main arguments for these negative TTLs are:

  1. Reducing server load: if the server doesn’t have libreoffice, don’t
     come back asking every 10s, it’s prolly useless.  You could easily
     have “GET storms” for libreoffice if clients don’t restrain
     themselves.

  2. Improving client performance: don’t GET things that are likely to
     fail.

Now, the penalty it imposes is annoying.  I’ve sometimes found myself
working around it, too (because I knew the server was going to have the
store item sooner than 1h).

Rather than removing it entirely, I can think of these options:

  1. Reduce the default negative timeouts.

  2. Add an option to ‘guix publish’ (and to the Coordinator?) so they
     send a ‘Cache-Control’ header with the chosen TTL on 404.  That
     way, if the server operator doesn’t mind extra load, they can run
     “guix publish --negative-ttl=0”.

WDYT?  Does that make any sense?

Ludo’.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Narinfo negative and transient error caching
  2021-03-05 22:27 Narinfo negative and transient error caching Christopher Baines
  2021-04-19 20:55 ` Christopher Baines
  2021-04-22 22:11 ` Ludovic Courtès
@ 2021-04-22 22:14 ` Ludovic Courtès
  2 siblings, 0 replies; 8+ messages in thread
From: Ludovic Courtès @ 2021-04-22 22:14 UTC (permalink / raw)
  To: Christopher Baines; +Cc: guix-devel, 47897

BTW, one thing that would be interesting too is to return 404 with a
long ‘Cache-Control’ validity when the requested store item is among the
cached failures.

We could also add an extra response header to explicitly communicate
that the store item is known to fail to build.

Ludo’.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Narinfo negative and transient error caching
  2021-04-22 22:11 ` Ludovic Courtès
@ 2021-04-22 23:14   ` Christopher Baines
  2021-05-11 13:09     ` bug#47897: [PATCH] substitutes: Don't cache negative lookups or transient errors Ludovic Courtès
  0 siblings, 1 reply; 8+ messages in thread
From: Christopher Baines @ 2021-04-22 23:14 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel, 47897

[-- Attachment #1: Type: text/plain, Size: 4400 bytes --]


Ludovic Courtès <ludo@gnu.org> writes:

> Hi!
>
> (“Sorry for the long delay” is officially my motto at this point.)
>
> Christopher Baines <mail@cbaines.net> skribis:
>
>> This has been on my mind for a while, as I wonder what effect it has on
>> users fetching substitues.
>>
>> The narinfo caching as I understand it works as follows:
>>
>>  Default success TTL => 36 hours
>>  Negative TTL        => 1 hour
>>  Transient error TTL => 10 minutes
>>
>> I'm ignoring the success TTL, I'm just interested in the negative and
>> transient error values. Negative means that when a server says it
>> doesn't have an output, that response will be cached for an
>> hour. Transient errors are for other HTTP response codes, like 504.
>
> You’re looking at the default TTLs, which are not the actual TTLs.
> Specifically, servers can include a ‘Cache-Control’ header in their
> reply specifying the TTL of their choice, and ‘guix substitute’ honors
> that:
>
>   https://git.savannah.gnu.org/cgit/guix.git/tree/guix/substitutes.scm#n200
>   https://git.savannah.gnu.org/cgit/guix.git/tree/guix/scripts/publish.scm#n371
>
> ‘guix publish’ returns 404 with a TTL of 5mn when the requested item is
> in store but needs to be “baked”.
>
> However, ‘guix publish’ does not set ‘Cache-Control’ when the request
> item is not in store.  In that case, clients use ‘%narinfo-negative-ttl’
> (1h).

You're right that the negative ttl is just a default, so it's possible
to override the default behaviour in the success and negative lookup
cases, but I don't believe the Cache-Control header is used for
transient errors.

>> I had a look through the Git history, caching negative lookups has been
>> a thing for a while. Caching transient errors was added, but I couldn't
>> see why.
>
> Transient error caching was most likely added in the days of
> hydra.gnu.org, that VM that was extremely slow.  When overloaded, you’d
> get 500 or similar, and at that point it was safer for clients to wait
> and come back later, possibly much later.  :-)
>
>> Personally I don't see a reason to keep either behaviours?
>
> The main arguments for these negative TTLs are:
>
>   1. Reducing server load: if the server doesn’t have libreoffice, don’t
>      come back asking every 10s, it’s prolly useless.  You could easily
>      have “GET storms” for libreoffice if clients don’t restrain
>      themselves.
>
>   2. Improving client performance: don’t GET things that are likely to
>      fail.

As you say, for the negative TTL, the question here is really what's the
best default value, if a server isn't specifying one.

Given that most narinfo requests precede a build for that thing if the
response is negative, I have my doubts about those two arguments
above. This is assuming the most common case is users asking guix to
install and upgrade things.

If a user gets a negative response, they'll just build it instead and
not check for that narinfo again. Even if they cancel that build when
they realise they don't want to build libreoffice, they'll wait a bit
anyway before retrying.

> Now, the penalty it imposes is annoying.  I’ve sometimes found myself
> working around it, too (because I knew the server was going to have the
> store item sooner than 1h).
>
> Rather than removing it entirely, I can think of these options:
>
>   1. Reduce the default negative timeouts.

I think reducing it is good, as you say, it's possible to override the
default from the server side. Just in case someone wants caching
behaviour, it might be worth keeping that functionality at least.

>   2. Add an option to ‘guix publish’ (and to the Coordinator?) so they
>      send a ‘Cache-Control’ header with the chosen TTL on 404.  That
>      way, if the server operator doesn’t mind extra load, they can run
>      “guix publish --negative-ttl=0”.

That sounds sensible. The Guix Build Coordinator doesn't do any serving,
that's left to something else like nginx. For the deployments I maintain
though, I don't think I'm setting the relevant headers, but I'll look at
changing that.

Going back to the %narinfo-transient-error-ttl, if I'm correct in saying
that it's not possible to override that, maybe that should also use the
relevant header value if set?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: bug#47897: [PATCH] substitutes: Don't cache negative lookups or transient errors.
  2021-04-22 23:14   ` Christopher Baines
@ 2021-05-11 13:09     ` Ludovic Courtès
  2021-05-14  7:31       ` Christopher Baines
  0 siblings, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2021-05-11 13:09 UTC (permalink / raw)
  To: Christopher Baines; +Cc: guix-devel, 47897

Hi,

Christopher Baines <mail@cbaines.net> skribis:

>> Now, the penalty it imposes is annoying.  I’ve sometimes found myself
>> working around it, too (because I knew the server was going to have the
>> store item sooner than 1h).
>>
>> Rather than removing it entirely, I can think of these options:
>>
>>   1. Reduce the default negative timeouts.
>
> I think reducing it is good, as you say, it's possible to override the
> default from the server side. Just in case someone wants caching
> behaviour, it might be worth keeping that functionality at least.

OK, let’s do that.

>>   2. Add an option to ‘guix publish’ (and to the Coordinator?) so they
>>      send a ‘Cache-Control’ header with the chosen TTL on 404.  That
>>      way, if the server operator doesn’t mind extra load, they can run
>>      “guix publish --negative-ttl=0”.
>
> That sounds sensible. The Guix Build Coordinator doesn't do any serving,
> that's left to something else like nginx. For the deployments I maintain
> though, I don't think I'm setting the relevant headers, but I'll look at
> changing that.

Cool.

> Going back to the %narinfo-transient-error-ttl, if I'm correct in saying
> that it's not possible to override that, maybe that should also use the
> relevant header value if set?

Correct, ‘%narinfo-transient-error-ttl’ cannot be overridden.  We can
halve it if you think that’s useful, thought when that happens, it means
something’s wrong with the server (returning 500 or similar).

I’ve sent patches to address this, lemme know what you think!

Thanks,
Ludo’.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: bug#47897: [PATCH] substitutes: Don't cache negative lookups or transient errors.
  2021-05-11 13:09     ` bug#47897: [PATCH] substitutes: Don't cache negative lookups or transient errors Ludovic Courtès
@ 2021-05-14  7:31       ` Christopher Baines
  2021-05-16 21:31         ` Ludovic Courtès
  0 siblings, 1 reply; 8+ messages in thread
From: Christopher Baines @ 2021-05-14  7:31 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel, 47897

[-- Attachment #1: Type: text/plain, Size: 1742 bytes --]


Ludovic Courtès <ludo@gnu.org> writes:

> Hi,
>
> Christopher Baines <mail@cbaines.net> skribis:
>
>>> Now, the penalty it imposes is annoying.  I’ve sometimes found myself
>>> working around it, too (because I knew the server was going to have the
>>> store item sooner than 1h).
>>>
>>> Rather than removing it entirely, I can think of these options:
>>>
>>>   1. Reduce the default negative timeouts.
>>
>> I think reducing it is good, as you say, it's possible to override the
>> default from the server side. Just in case someone wants caching
>> behaviour, it might be worth keeping that functionality at least.
>
> OK, let’s do that.
>
>>>   2. Add an option to ‘guix publish’ (and to the Coordinator?) so they
>>>      send a ‘Cache-Control’ header with the chosen TTL on 404.  That
>>>      way, if the server operator doesn’t mind extra load, they can run
>>>      “guix publish --negative-ttl=0”.
>>
>> That sounds sensible. The Guix Build Coordinator doesn't do any serving,
>> that's left to something else like nginx. For the deployments I maintain
>> though, I don't think I'm setting the relevant headers, but I'll look at
>> changing that.
>
> Cool.
>
>> Going back to the %narinfo-transient-error-ttl, if I'm correct in saying
>> that it's not possible to override that, maybe that should also use the
>> relevant header value if set?
>
> Correct, ‘%narinfo-transient-error-ttl’ cannot be overridden.  We can
> halve it if you think that’s useful, thought when that happens, it means
> something’s wrong with the server (returning 500 or similar).
>
> I’ve sent patches to address this, lemme know what you think!

The patches you've sent look good.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: bug#47897: [PATCH] substitutes: Don't cache negative lookups or transient errors.
  2021-05-14  7:31       ` Christopher Baines
@ 2021-05-16 21:31         ` Ludovic Courtès
  0 siblings, 0 replies; 8+ messages in thread
From: Ludovic Courtès @ 2021-05-16 21:31 UTC (permalink / raw)
  To: Christopher Baines; +Cc: guix-devel, 47897-done

Hi,

Christopher Baines <mail@cbaines.net> skribis:

> The patches you've sent look good.

Pushed as 938ffcbb0589adc07dc12c79eda3e1e2bb9e7cf8 (I was generous and
lowered ‘%narinfo-negative-ttl’ to 10mn :-)).

Thanks,
Ludo’.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-05-16 21:31 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-05 22:27 Narinfo negative and transient error caching Christopher Baines
2021-04-19 20:55 ` Christopher Baines
2021-04-22 22:11 ` Ludovic Courtès
2021-04-22 23:14   ` Christopher Baines
2021-05-11 13:09     ` bug#47897: [PATCH] substitutes: Don't cache negative lookups or transient errors Ludovic Courtès
2021-05-14  7:31       ` Christopher Baines
2021-05-16 21:31         ` Ludovic Courtès
2021-04-22 22:14 ` Narinfo negative and transient error caching Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).