guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Narinfo negative and transient error caching


From: Christopher Baines
Subject: Re: Narinfo negative and transient error caching
Date: Fri, 23 Apr 2021 00:14:24 +0100
User-agent: mu4e 1.4.15; emacs 27.1

Ludovic Courtès <ludo@gnu.org> writes:

> Hi!
>
> (“Sorry for the long delay” is officially my motto at this point.)
>
> Christopher Baines <mail@cbaines.net> skribis:
>
>> This has been on my mind for a while, as I wonder what effect it has on
>> users fetching substitues.
>>
>> The narinfo caching as I understand it works as follows:
>>
>>  Default success TTL => 36 hours
>>  Negative TTL        => 1 hour
>>  Transient error TTL => 10 minutes
>>
>> I'm ignoring the success TTL, I'm just interested in the negative and
>> transient error values. Negative means that when a server says it
>> doesn't have an output, that response will be cached for an
>> hour. Transient errors are for other HTTP response codes, like 504.
>
> You’re looking at the default TTLs, which are not the actual TTLs.
> Specifically, servers can include a ‘Cache-Control’ header in their
> reply specifying the TTL of their choice, and ‘guix substitute’ honors
> that:
>
>   https://git.savannah.gnu.org/cgit/guix.git/tree/guix/substitutes.scm#n200
>   
> https://git.savannah.gnu.org/cgit/guix.git/tree/guix/scripts/publish.scm#n371
>
> ‘guix publish’ returns 404 with a TTL of 5mn when the requested item is
> in store but needs to be “baked”.
>
> However, ‘guix publish’ does not set ‘Cache-Control’ when the request
> item is not in store.  In that case, clients use ‘%narinfo-negative-ttl’
> (1h).

You're right that the negative ttl is just a default, so it's possible
to override the default behaviour in the success and negative lookup
cases, but I don't believe the Cache-Control header is used for
transient errors.

>> I had a look through the Git history, caching negative lookups has been
>> a thing for a while. Caching transient errors was added, but I couldn't
>> see why.
>
> Transient error caching was most likely added in the days of
> hydra.gnu.org, that VM that was extremely slow.  When overloaded, you’d
> get 500 or similar, and at that point it was safer for clients to wait
> and come back later, possibly much later.  :-)
>
>> Personally I don't see a reason to keep either behaviours?
>
> The main arguments for these negative TTLs are:
>
>   1. Reducing server load: if the server doesn’t have libreoffice, don’t
>      come back asking every 10s, it’s prolly useless.  You could easily
>      have “GET storms” for libreoffice if clients don’t restrain
>      themselves.
>
>   2. Improving client performance: don’t GET things that are likely to
>      fail.

As you say, for the negative TTL, the question here is really what's the
best default value, if a server isn't specifying one.

Given that most narinfo requests precede a build for that thing if the
response is negative, I have my doubts about those two arguments
above. This is assuming the most common case is users asking guix to
install and upgrade things.

If a user gets a negative response, they'll just build it instead and
not check for that narinfo again. Even if they cancel that build when
they realise they don't want to build libreoffice, they'll wait a bit
anyway before retrying.

> Now, the penalty it imposes is annoying.  I’ve sometimes found myself
> working around it, too (because I knew the server was going to have the
> store item sooner than 1h).
>
> Rather than removing it entirely, I can think of these options:
>
>   1. Reduce the default negative timeouts.

I think reducing it is good, as you say, it's possible to override the
default from the server side. Just in case someone wants caching
behaviour, it might be worth keeping that functionality at least.

>   2. Add an option to ‘guix publish’ (and to the Coordinator?) so they
>      send a ‘Cache-Control’ header with the chosen TTL on 404.  That
>      way, if the server operator doesn’t mind extra load, they can run
>      “guix publish --negative-ttl=0”.

That sounds sensible. The Guix Build Coordinator doesn't do any serving,
that's left to something else like nginx. For the deployments I maintain
though, I don't think I'm setting the relevant headers, but I'll look at
changing that.

Going back to the %narinfo-transient-error-ttl, if I'm correct in saying
that it's not possible to override that, maybe that should also use the
relevant header value if set?

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]