Mark stale cache content as "invalid" on non-cacheable responses

Wed Nov 18 18:56:38 UTC 2015

Maxim Dounin <mdounin at mdounin.ru> wrote:
> <...>
> > 
> > In your scenario, the upstream server requested such behaviour; it is a
> > transition point.
> 
> It didn't requested anything.  It merely returned an error.
> 

I am afraid I cannot agree with this.  Cache-Control is a directive which
requests certain behaviour from a cache.  Think of 'no-cache' as a barrier
marking the necessary transition point.  RFC 7234 section 4.2.4 ("Serving
Stale Responses") seems to be clear on the stale case too (section 4 also
makes an obvious point that the most recent response should be obeyed):

   A cache MUST NOT generate a stale response if it is prohibited by an
   explicit in-protocol directive (e.g., by a "no-store" or "no-cache"
   cache directive, a "must-revalidate" cache-response-directive, or an
   applicable "s-maxage" or "proxy-revalidate" cache-response-directive;
   see Section 5.2.2).

> > The "worst thing" also happens if the response would
> > result in a temporary cacheable error.
> 
> And that's why returning a "temporary cacheable error" is a bad 
> idea if you are using proxy_cache_use_stale.
> 
> > This is primarily a question of
> > trusting/calibrating your upstream server (i.e. setting the
> > Cache-Control headers) vs deliberately overriding it.  There is no
> > "correct" handling in a general sense here, because this really depends
> > on the caching layers you build or integrate with.
> 
> I agree: there is no correct handling if you don't know your 
> upstream server behaviour.  By enabling use of stale responses you 
> agree that your upstream server will behave accordingly.  In your 
> scenario, the upstream server misbehaves, and this (expectedly) 
> causes the problem.

Why temporary caching of an error is a bad idea?  The upstream server
in my example had such configuration deliberately, it did not misbehave.
For the given URI it does serve the dynamic content which must never be
cached.  However, it has a more general policy asking to cache the errors
for 3 seconds.  This is to defend the potentially struggling or failing
origin.  It seems like a quite practical reason; I think it is something
used quite commonly in the industry.

> > Also, I would argue that the expectation is to serve the stale content
> > while the new content and its parameters are *unknown* (say, because,
> > for instance, it is still being fetched).  The point here is that the
> > upstream server has made it known by serving a 200 and indicating the
> > desire for it to not be cached.  Let me put it this way: how else the
> > upstream server could tell the cache in front that it has to exit the
> > serve-stale state? Currently, nginx gets stuck -- the only way to
> > eliminate those sporadic errors is to manually purge those stale files.
> 
> As of now, there is no way how upstream server can control how 
> previously cached responses will be used to serve stale responses 
> (if nginx is configured to do so).

Again, the way I interpret RFC, is that the Cache-Control header *is*
the way.

> You suggest to address it by making 200 + no-cache to be special 
> and mean something "please remove anything cached".  This disagree 
> with the code you've provided though, as it makes any non-cacheable 
> response special.  Additionally, this disagree with various use 
> cases when a non-cacheable response doesn't mean anything special, 
> but rather an error, even if returned with status 200.  Or, in 
> some more complicated setups, it may be just a user-specific 
> response (which shouldn't be cached, in contrast to generic 
> responses to the same resource).

In the original case, nginx sporadically throws errors at users when there
is no real error, while temporarily caching errors when they indeed happen
is a beneficial and desired feature.  However, I do not think it really
matters whether one of the responses is an error or not.  Let's talk about
the generic case.  If we have a sequence of cacheable responses and then a
response with the Cache-Control header set to 'no-cache', then I believe
the cache must invalidate that content.  Because otherwise it does not obey
the upstream server and does not preserve the consistency of the content.

Let's put it this way: what is your use case i.e. when is such behaviour
problematic?  If you have a location (object or page) where the upstream
server constantly mixes "cache me" and "don't cache me", then there is no
point to cache it (i.e. it is inherently not cacheable content which just
busts your cache anyway).

> > Right, whether 504s specifically (and other timeouts) should be cached
> > is something what can be debated.  The real question here is what the
> > users want to achieve with proxy_cache_use_stale.  It is a mechanism
> > provided to avoid the redundant requests to the upstream server,
> > right?  And one aspect in particular is caching the errors for very
> > short time to defend a struggling or failing upstream server.  It hope
> > we can agree that it is rather practical to recover from such state.
> 
> Caching errors is not something proxy_cache_use_stale was 
> introduced for.  And this case rather contradicts 
> proxy_cache_use_stale assumptions about upstream server behaviour.  
> That is, two basic options are to either change the behaviour, or 
> to avoid using "proxy_cache_use_stale updating".

Perhaps it was not, but it provides such option and the option is used in
the wildness.  Again, the presence of error here does not matter much as
the real problem is obeying the upstream server directives and preserving
the consistency.

> > Sporadically serving errors makes users unhappy.  However, it is not
> > even about the errors here.  You can also reproduce the problem with
> > different content i.e. if the upstream server serves cacheable HTTP 200
> > (call it A) and then non-cacheable HTTP 200 (call it B).  Some clients
> > will get A and some will get B (depending on who is winning the update
> > race).  Hence the real problem is that nginx is not consistent: it
> > serves different content based on a *race condition*.  How exactly is
> > this beneficial or desirable?
> 
> This example is basically the same, so see above.
> 

Right, it just a good illustration of the consistency problem.  I do not
really see a conceptual between the current nginx behaviour and a database
sporadically returning the result of some old transaction.  It's broken.

> Again, I don't say current behaviour is good.  It has an obvious 
> limitation, and it would be good to resolve this limitation.  But 
> the solution proposed doesn't look like a good one either.

Okay, so what solution do you propose?

-- 
Mindaugas