guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Narinfo negative and transient error caching


From: Ludovic Courtès
Subject: Re: Narinfo negative and transient error caching
Date: Fri, 23 Apr 2021 00:11:31 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Hi!

(“Sorry for the long delay” is officially my motto at this point.)

Christopher Baines <mail@cbaines.net> skribis:

> This has been on my mind for a while, as I wonder what effect it has on
> users fetching substitues.
>
> The narinfo caching as I understand it works as follows:
>
>  Default success TTL => 36 hours
>  Negative TTL        => 1 hour
>  Transient error TTL => 10 minutes
>
> I'm ignoring the success TTL, I'm just interested in the negative and
> transient error values. Negative means that when a server says it
> doesn't have an output, that response will be cached for an
> hour. Transient errors are for other HTTP response codes, like 504.

You’re looking at the default TTLs, which are not the actual TTLs.
Specifically, servers can include a ‘Cache-Control’ header in their
reply specifying the TTL of their choice, and ‘guix substitute’ honors
that:

  https://git.savannah.gnu.org/cgit/guix.git/tree/guix/substitutes.scm#n200
  https://git.savannah.gnu.org/cgit/guix.git/tree/guix/scripts/publish.scm#n371

‘guix publish’ returns 404 with a TTL of 5mn when the requested item is
in store but needs to be “baked”.

However, ‘guix publish’ does not set ‘Cache-Control’ when the request
item is not in store.  In that case, clients use ‘%narinfo-negative-ttl’
(1h).

> I had a look through the Git history, caching negative lookups has been
> a thing for a while. Caching transient errors was added, but I couldn't
> see why.

Transient error caching was most likely added in the days of
hydra.gnu.org, that VM that was extremely slow.  When overloaded, you’d
get 500 or similar, and at that point it was safer for clients to wait
and come back later, possibly much later.  :-)

> Personally I don't see a reason to keep either behaviours?

The main arguments for these negative TTLs are:

  1. Reducing server load: if the server doesn’t have libreoffice, don’t
     come back asking every 10s, it’s prolly useless.  You could easily
     have “GET storms” for libreoffice if clients don’t restrain
     themselves.

  2. Improving client performance: don’t GET things that are likely to
     fail.

Now, the penalty it imposes is annoying.  I’ve sometimes found myself
working around it, too (because I knew the server was going to have the
store item sooner than 1h).

Rather than removing it entirely, I can think of these options:

  1. Reduce the default negative timeouts.

  2. Add an option to ‘guix publish’ (and to the Coordinator?) so they
     send a ‘Cache-Control’ header with the chosen TTL on 404.  That
     way, if the server operator doesn’t mind extra load, they can run
     “guix publish --negative-ttl=0”.

WDYT?  Does that make any sense?

Ludo’.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]