* Narinfo negative and transient error caching @ 2021-03-05 22:27 Christopher Baines 2021-04-19 20:55 ` Christopher Baines ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Christopher Baines @ 2021-03-05 22:27 UTC (permalink / raw) To: guix-devel [-- Attachment #1: Type: text/plain, Size: 1304 bytes --] Hey, This has been on my mind for a while, as I wonder what effect it has on users fetching substitues. The narinfo caching as I understand it works as follows: Default success TTL => 36 hours Negative TTL => 1 hour Transient error TTL => 10 minutes I'm ignoring the success TTL, I'm just interested in the negative and transient error values. Negative means that when a server says it doesn't have an output, that response will be cached for an hour. Transient errors are for other HTTP response codes, like 504. I had a look through the Git history, caching negative lookups has been a thing for a while. Caching transient errors was added, but I couldn't see why. Personally I don't see a reason to keep either behaviours? In an extreme case, the Guix Build Coordinator has to work hard to work around this caching. Asking the guix-daemon if a substitute exists is dangerous, as it literally costs an hour if that substitute isn't available yet, but will be shortly (which happens all the time when building a bunch of things). Currently it checks itself, and only continues to ask the guix-daemon to fetch the item if it knows it to exist. The transient error caching is also problematic, as that imposes a 10 minute penalty if there's a server issue. Any thoughts? Thanks, Chris [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 987 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Narinfo negative and transient error caching 2021-03-05 22:27 Narinfo negative and transient error caching Christopher Baines @ 2021-04-19 20:55 ` Christopher Baines 2021-04-22 22:11 ` Ludovic Courtès 2021-04-22 22:14 ` Narinfo negative and transient error caching Ludovic Courtès 2 siblings, 0 replies; 8+ messages in thread From: Christopher Baines @ 2021-04-19 20:55 UTC (permalink / raw) To: guix-devel [-- Attachment #1: Type: text/plain, Size: 903 bytes --] Christopher Baines <mail@cbaines.net> writes: > This has been on my mind for a while, as I wonder what effect it has on > users fetching substitues. > > The narinfo caching as I understand it works as follows: > > Default success TTL => 36 hours > Negative TTL => 1 hour > Transient error TTL => 10 minutes > > I'm ignoring the success TTL, I'm just interested in the negative and > transient error values. Negative means that when a server says it > doesn't have an output, that response will be cached for an > hour. Transient errors are for other HTTP response codes, like 504. > > I had a look through the Git history, caching negative lookups has been > a thing for a while. Caching transient errors was added, but I couldn't > see why. > > Personally I don't see a reason to keep either behaviours? I've now sent a patch to remove this behaviour: https://issues.guix.gnu.org/47897 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 987 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Narinfo negative and transient error caching 2021-03-05 22:27 Narinfo negative and transient error caching Christopher Baines 2021-04-19 20:55 ` Christopher Baines @ 2021-04-22 22:11 ` Ludovic Courtès 2021-04-22 23:14 ` Christopher Baines 2021-04-22 22:14 ` Narinfo negative and transient error caching Ludovic Courtès 2 siblings, 1 reply; 8+ messages in thread From: Ludovic Courtès @ 2021-04-22 22:11 UTC (permalink / raw) To: Christopher Baines; +Cc: guix-devel, 47897 Hi! (“Sorry for the long delay” is officially my motto at this point.) Christopher Baines <mail@cbaines.net> skribis: > This has been on my mind for a while, as I wonder what effect it has on > users fetching substitues. > > The narinfo caching as I understand it works as follows: > > Default success TTL => 36 hours > Negative TTL => 1 hour > Transient error TTL => 10 minutes > > I'm ignoring the success TTL, I'm just interested in the negative and > transient error values. Negative means that when a server says it > doesn't have an output, that response will be cached for an > hour. Transient errors are for other HTTP response codes, like 504. You’re looking at the default TTLs, which are not the actual TTLs. Specifically, servers can include a ‘Cache-Control’ header in their reply specifying the TTL of their choice, and ‘guix substitute’ honors that: https://git.savannah.gnu.org/cgit/guix.git/tree/guix/substitutes.scm#n200 https://git.savannah.gnu.org/cgit/guix.git/tree/guix/scripts/publish.scm#n371 ‘guix publish’ returns 404 with a TTL of 5mn when the requested item is in store but needs to be “baked”. However, ‘guix publish’ does not set ‘Cache-Control’ when the request item is not in store. In that case, clients use ‘%narinfo-negative-ttl’ (1h). > I had a look through the Git history, caching negative lookups has been > a thing for a while. Caching transient errors was added, but I couldn't > see why. Transient error caching was most likely added in the days of hydra.gnu.org, that VM that was extremely slow. When overloaded, you’d get 500 or similar, and at that point it was safer for clients to wait and come back later, possibly much later. :-) > Personally I don't see a reason to keep either behaviours? The main arguments for these negative TTLs are: 1. Reducing server load: if the server doesn’t have libreoffice, don’t come back asking every 10s, it’s prolly useless. You could easily have “GET storms” for libreoffice if clients don’t restrain themselves. 2. Improving client performance: don’t GET things that are likely to fail. Now, the penalty it imposes is annoying. I’ve sometimes found myself working around it, too (because I knew the server was going to have the store item sooner than 1h). Rather than removing it entirely, I can think of these options: 1. Reduce the default negative timeouts. 2. Add an option to ‘guix publish’ (and to the Coordinator?) so they send a ‘Cache-Control’ header with the chosen TTL on 404. That way, if the server operator doesn’t mind extra load, they can run “guix publish --negative-ttl=0”. WDYT? Does that make any sense? Ludo’. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Narinfo negative and transient error caching 2021-04-22 22:11 ` Ludovic Courtès @ 2021-04-22 23:14 ` Christopher Baines 2021-05-11 13:09 ` bug#47897: [PATCH] substitutes: Don't cache negative lookups or transient errors Ludovic Courtès 0 siblings, 1 reply; 8+ messages in thread From: Christopher Baines @ 2021-04-22 23:14 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guix-devel, 47897 [-- Attachment #1: Type: text/plain, Size: 4400 bytes --] Ludovic Courtès <ludo@gnu.org> writes: > Hi! > > (“Sorry for the long delay” is officially my motto at this point.) > > Christopher Baines <mail@cbaines.net> skribis: > >> This has been on my mind for a while, as I wonder what effect it has on >> users fetching substitues. >> >> The narinfo caching as I understand it works as follows: >> >> Default success TTL => 36 hours >> Negative TTL => 1 hour >> Transient error TTL => 10 minutes >> >> I'm ignoring the success TTL, I'm just interested in the negative and >> transient error values. Negative means that when a server says it >> doesn't have an output, that response will be cached for an >> hour. Transient errors are for other HTTP response codes, like 504. > > You’re looking at the default TTLs, which are not the actual TTLs. > Specifically, servers can include a ‘Cache-Control’ header in their > reply specifying the TTL of their choice, and ‘guix substitute’ honors > that: > > https://git.savannah.gnu.org/cgit/guix.git/tree/guix/substitutes.scm#n200 > https://git.savannah.gnu.org/cgit/guix.git/tree/guix/scripts/publish.scm#n371 > > ‘guix publish’ returns 404 with a TTL of 5mn when the requested item is > in store but needs to be “baked”. > > However, ‘guix publish’ does not set ‘Cache-Control’ when the request > item is not in store. In that case, clients use ‘%narinfo-negative-ttl’ > (1h). You're right that the negative ttl is just a default, so it's possible to override the default behaviour in the success and negative lookup cases, but I don't believe the Cache-Control header is used for transient errors. >> I had a look through the Git history, caching negative lookups has been >> a thing for a while. Caching transient errors was added, but I couldn't >> see why. > > Transient error caching was most likely added in the days of > hydra.gnu.org, that VM that was extremely slow. When overloaded, you’d > get 500 or similar, and at that point it was safer for clients to wait > and come back later, possibly much later. :-) > >> Personally I don't see a reason to keep either behaviours? > > The main arguments for these negative TTLs are: > > 1. Reducing server load: if the server doesn’t have libreoffice, don’t > come back asking every 10s, it’s prolly useless. You could easily > have “GET storms” for libreoffice if clients don’t restrain > themselves. > > 2. Improving client performance: don’t GET things that are likely to > fail. As you say, for the negative TTL, the question here is really what's the best default value, if a server isn't specifying one. Given that most narinfo requests precede a build for that thing if the response is negative, I have my doubts about those two arguments above. This is assuming the most common case is users asking guix to install and upgrade things. If a user gets a negative response, they'll just build it instead and not check for that narinfo again. Even if they cancel that build when they realise they don't want to build libreoffice, they'll wait a bit anyway before retrying. > Now, the penalty it imposes is annoying. I’ve sometimes found myself > working around it, too (because I knew the server was going to have the > store item sooner than 1h). > > Rather than removing it entirely, I can think of these options: > > 1. Reduce the default negative timeouts. I think reducing it is good, as you say, it's possible to override the default from the server side. Just in case someone wants caching behaviour, it might be worth keeping that functionality at least. > 2. Add an option to ‘guix publish’ (and to the Coordinator?) so they > send a ‘Cache-Control’ header with the chosen TTL on 404. That > way, if the server operator doesn’t mind extra load, they can run > “guix publish --negative-ttl=0”. That sounds sensible. The Guix Build Coordinator doesn't do any serving, that's left to something else like nginx. For the deployments I maintain though, I don't think I'm setting the relevant headers, but I'll look at changing that. Going back to the %narinfo-transient-error-ttl, if I'm correct in saying that it's not possible to override that, maybe that should also use the relevant header value if set? [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 987 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bug#47897: [PATCH] substitutes: Don't cache negative lookups or transient errors. 2021-04-22 23:14 ` Christopher Baines @ 2021-05-11 13:09 ` Ludovic Courtès 2021-05-14 7:31 ` Christopher Baines 0 siblings, 1 reply; 8+ messages in thread From: Ludovic Courtès @ 2021-05-11 13:09 UTC (permalink / raw) To: Christopher Baines; +Cc: guix-devel, 47897 Hi, Christopher Baines <mail@cbaines.net> skribis: >> Now, the penalty it imposes is annoying. I’ve sometimes found myself >> working around it, too (because I knew the server was going to have the >> store item sooner than 1h). >> >> Rather than removing it entirely, I can think of these options: >> >> 1. Reduce the default negative timeouts. > > I think reducing it is good, as you say, it's possible to override the > default from the server side. Just in case someone wants caching > behaviour, it might be worth keeping that functionality at least. OK, let’s do that. >> 2. Add an option to ‘guix publish’ (and to the Coordinator?) so they >> send a ‘Cache-Control’ header with the chosen TTL on 404. That >> way, if the server operator doesn’t mind extra load, they can run >> “guix publish --negative-ttl=0”. > > That sounds sensible. The Guix Build Coordinator doesn't do any serving, > that's left to something else like nginx. For the deployments I maintain > though, I don't think I'm setting the relevant headers, but I'll look at > changing that. Cool. > Going back to the %narinfo-transient-error-ttl, if I'm correct in saying > that it's not possible to override that, maybe that should also use the > relevant header value if set? Correct, ‘%narinfo-transient-error-ttl’ cannot be overridden. We can halve it if you think that’s useful, thought when that happens, it means something’s wrong with the server (returning 500 or similar). I’ve sent patches to address this, lemme know what you think! Thanks, Ludo’. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bug#47897: [PATCH] substitutes: Don't cache negative lookups or transient errors. 2021-05-11 13:09 ` bug#47897: [PATCH] substitutes: Don't cache negative lookups or transient errors Ludovic Courtès @ 2021-05-14 7:31 ` Christopher Baines 2021-05-16 21:31 ` Ludovic Courtès 0 siblings, 1 reply; 8+ messages in thread From: Christopher Baines @ 2021-05-14 7:31 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guix-devel, 47897 [-- Attachment #1: Type: text/plain, Size: 1742 bytes --] Ludovic Courtès <ludo@gnu.org> writes: > Hi, > > Christopher Baines <mail@cbaines.net> skribis: > >>> Now, the penalty it imposes is annoying. I’ve sometimes found myself >>> working around it, too (because I knew the server was going to have the >>> store item sooner than 1h). >>> >>> Rather than removing it entirely, I can think of these options: >>> >>> 1. Reduce the default negative timeouts. >> >> I think reducing it is good, as you say, it's possible to override the >> default from the server side. Just in case someone wants caching >> behaviour, it might be worth keeping that functionality at least. > > OK, let’s do that. > >>> 2. Add an option to ‘guix publish’ (and to the Coordinator?) so they >>> send a ‘Cache-Control’ header with the chosen TTL on 404. That >>> way, if the server operator doesn’t mind extra load, they can run >>> “guix publish --negative-ttl=0”. >> >> That sounds sensible. The Guix Build Coordinator doesn't do any serving, >> that's left to something else like nginx. For the deployments I maintain >> though, I don't think I'm setting the relevant headers, but I'll look at >> changing that. > > Cool. > >> Going back to the %narinfo-transient-error-ttl, if I'm correct in saying >> that it's not possible to override that, maybe that should also use the >> relevant header value if set? > > Correct, ‘%narinfo-transient-error-ttl’ cannot be overridden. We can > halve it if you think that’s useful, thought when that happens, it means > something’s wrong with the server (returning 500 or similar). > > I’ve sent patches to address this, lemme know what you think! The patches you've sent look good. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 987 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bug#47897: [PATCH] substitutes: Don't cache negative lookups or transient errors. 2021-05-14 7:31 ` Christopher Baines @ 2021-05-16 21:31 ` Ludovic Courtès 0 siblings, 0 replies; 8+ messages in thread From: Ludovic Courtès @ 2021-05-16 21:31 UTC (permalink / raw) To: Christopher Baines; +Cc: guix-devel, 47897-done Hi, Christopher Baines <mail@cbaines.net> skribis: > The patches you've sent look good. Pushed as 938ffcbb0589adc07dc12c79eda3e1e2bb9e7cf8 (I was generous and lowered ‘%narinfo-negative-ttl’ to 10mn :-)). Thanks, Ludo’. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Narinfo negative and transient error caching 2021-03-05 22:27 Narinfo negative and transient error caching Christopher Baines 2021-04-19 20:55 ` Christopher Baines 2021-04-22 22:11 ` Ludovic Courtès @ 2021-04-22 22:14 ` Ludovic Courtès 2 siblings, 0 replies; 8+ messages in thread From: Ludovic Courtès @ 2021-04-22 22:14 UTC (permalink / raw) To: Christopher Baines; +Cc: guix-devel, 47897 BTW, one thing that would be interesting too is to return 404 with a long ‘Cache-Control’ validity when the requested store item is among the cached failures. We could also add an extra response header to explicitly communicate that the store item is known to fail to build. Ludo’. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-05-16 21:31 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-03-05 22:27 Narinfo negative and transient error caching Christopher Baines 2021-04-19 20:55 ` Christopher Baines 2021-04-22 22:11 ` Ludovic Courtès 2021-04-22 23:14 ` Christopher Baines 2021-05-11 13:09 ` bug#47897: [PATCH] substitutes: Don't cache negative lookups or transient errors Ludovic Courtès 2021-05-14 7:31 ` Christopher Baines 2021-05-16 21:31 ` Ludovic Courtès 2021-04-22 22:14 ` Narinfo negative and transient error caching Ludovic Courtès
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/guix.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).