NGINX stale-while-revalidate cluster

Sun Jul 9 20:58:59 UTC 2017

stale-while-revalidate is awesome, but it might not be the optimal tool here. It came out of Yahoo!, 
 the sixth largest website in the world, who used a small number of caching proxies. In their context  
most content is served hot from cache. A cloud deployment typically means a larger number of VMs 
that are each a fraction of  a physical server. Great for fine grained control but a problem for cache hit rates. 
So if you have much less traffic than Yahoo spread across a larger number of web servers then your hit rates
will suffer. What hit rates do you see today?

Dynamic scale out isn’t very compatible with caching reverse proxies. 
Can you separate the caching reverse proxy functionality from the other functionality
and keep the number of caches constant, whilst scaling out the web servers?

Your give the example of a few hour old page being served because of a scale out event. Is that the most
 common case of cache misses in your context or is it unpopular  pages and quiet times of the day?
Are these also served stale even when your server count is static?

Finally, if the root of the problem is serving very stale content, could you simply delete that content 
throughout the day? A script that finds and removes all cached files older than five minutes wouldn’t
 take long to run.

Peter

> On 8 Jul 2017, at 4:28 PM, Joan Tomàs i Buliart <joan.tomas at marfeel.com> wrote:
> 
> Hi Peter,
> 
> 
> yes, it's true. I will try to explain our problem better.
> 
> We provide a mobile solution for newspaper and media groups. With this kind of partners, it is easy to have a peak of traffic. We prefer to give stale content (1 or 2 minutes stale content, not more) instead of block the request for some seconds (the time that our tomcat back-end could expend to crawl our customers desktop site and generate the new content). As I tried to explain in my first e-mail, the proxy_cache_background_update <http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_background_update> works ok while the number of servers is fix and the LB in front of them does a URI load balancer.
> 
> The major problem appears when the servers has to scale up and scale down. Imagine that the URL1 is cache by server 1. All the request for URL1 are redirected to Server1 by the LB. Suddenly, the traffic raise up and a new server is added. The LB will remap the request in order to send some URLs to server 2. The URL1 is one of this group of URL that goes to server 2. Some hours later, the traffic goes down and the server 2 is removed. In this situation, the new request that arrive to Server 1 asking for URL1 will receive the version of some hours before (not some minutes). This is what we are trying to avoid.
> 
> Many thanks for all your feedback and suggestions,
> 
> 
> Joan
> On 08/07/17 15:30, Peter Booth wrote:
>> Perhaps it would help if, rather than focus on the specific solution that you are wanting, you instead explained your specific problem and business context?
>> 
>> What is driving your architecture? Is it about protecting a backend that doesn't scale or more about reducing latencies?
>> 
>> How many different requests are there that might be cached? What are the backend calls doing? How do cached objects expire? How long does a call to the backend take? 
>> Why is it OK to return a stale version of X to the first client but not OK to return a stale version to a second requester?
>> 
>> Imagine a scenario where two identical requests arrive from different clients and hit different web servers. Is it OK for both requests to be satisfied with a stale resource?
>> 
>> It's very easy for us to make incorrect assumptions about all of these questions because of our own experiences.
>> 
>> Peter
>> 
>> Sent from my iPhone
>> 
>> On Jul 8, 2017, at 9:00 AM, Joan Tomàs i Buliart <joan.tomas at marfeel.com <mailto:joan.tomas at marfeel.com>> wrote:
>> 
>>> Thanks Owen!
>>> 
>>> We considered all the options on these 2 documents but, on our environment in which is important to use stale-while-revalidate, all of them have, at least, one of these drawbacks: or it adds a layer in the fast path to the content or it can't guarantee that one request on a stale content will force the invalidation off all the copies of this object.
>>> 
>>> That is the reason for which we are looking for a "background" alternative to update the content.
>>> 
>>> Many thanks in any case,
>>> 
>>> Joan
>>> 
>>> On 07/07/17 16:04, Owen Garrett wrote:
>>>> There are a couple of options described here that you could consider if you want to share your cache between NGINX instances:
>>>> 
>>>> https://www.nginx.com/blog/shared-caches-nginx-plus-cache-clusters-part-1/ <https://www.nginx.com/blog/shared-caches-nginx-plus-cache-clusters-part-1/> describes a sharded cache approach, where you load-balance by URI across the NGINX cache servers.  You can combine your front-end load balancers and back-end caches onto one tier to reduce your footprint if you wish
>>>> 
>>>> https://www.nginx.com/blog/shared-caches-nginx-plus-cache-clusters-part-2/ <https://www.nginx.com/blog/shared-caches-nginx-plus-cache-clusters-part-2/> describes an alternative HA (shared) approach that replicates the cache so that there’s no increased load on the origin server if one cache server fails.
>>>> 
>>>> It’s not possible to share a cache across instances by using a shared filesystem (e.g. nfs).
>>>> 
>>>> ---
>>>> owen at nginx.com <mailto:owen at nginx.com>
>>>> Skype: owen.garrett
>>>> Cell: +44 7764 344779
>>>> 
>>>>> On 7 Jul 2017, at 14:39, Peter Booth <peter_booth at me.com <mailto:peter_booth at me.com>> wrote:
>>>>> 
>>>>> You could do that but it would be bad. Nginx' great performance is based on serving files from a local Fisk and the behavior of a Linux page cache. If you serve from a shared (nfs) filsystem then every request is slower. You shouldn't slow down the common case just to increase cache hit rate.
>>>>> 
>>>>> Sent from my iPhone
>>>>> 
>>>>> On Jul 7, 2017, at 9:24 AM, Frank Dias <frank.dias at prodea.com <mailto:frank.dias at prodea.com>> wrote:
>>>>> 
>>>>>> Have you thought about using a shared file system for the cache. This way all the nginx 's are looking at the same cached content.
>>>>>> 
>>>>>> On Jul 7, 2017 5:30 AM, Joan Tomàs i Buliart <joan.tomas at marfeel.com <mailto:joan.tomas at marfeel.com>> wrote:
>>>>>> Hi Lucas
>>>>>> 
>>>>>> On 07/07/17 12:12, Lucas Rolff wrote:
>>>>>> > Instead of doing round robin load balancing why not do a URI based 
>>>>>> > load balancing? Then you ensure your cached file is only present on a 
>>>>>> > single machine behind the load balancer.
>>>>>> 
>>>>>> Yes, we considered this option but it forces us to deploy and maintain 
>>>>>> another layer (LB+NG+AppServer). All cloud providers have round robin 
>>>>>> load balancers out-of-the-box but no one provides URI based load 
>>>>>> balancer. Moreover, in our scenario, our webservers layer is quite 
>>>>>> dynamic due to scaling up/down.
>>>>>> 
>>>>>> Best,
>>>>>> 
>>>>>> Joan
>>>>>> _______________________________________________
>>>>>> nginx mailing list
>>>>>> nginx at nginx.org <mailto:nginx at nginx.org>
>>>>>> http://mailman.nginx.org/mailman/listinfo/nginx <http://mailman.nginx.org/mailman/listinfo/nginx>
>>>>>> 
>>>>>> This message is confidential to Prodea unless otherwise indicated or apparent from its nature. This message is directed to the intended recipient only, who may be readily determined by the sender of this message and its contents. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient:(a)any dissemination or copying of this message is strictly prohibited; and(b)immediately notify the sender by return message and destroy any copies of this message in any form(electronic, paper or otherwise) that you have.The delivery of this message and its information is neither intended to be nor constitutes a disclosure or waiver of any trade secrets, intellectual property, attorney work product, or attorney-client communications. The authority of the individual sending this message to legally bind Prodea is neither apparent nor implied,and must be independently verified.
>>>>>> _______________________________________________
>>>>>> nginx mailing list
>>>>>> nginx at nginx.org <mailto:nginx at nginx.org>
>>>>>> http://mailman.nginx.org/mailman/listinfo/nginx <http://mailman.nginx.org/mailman/listinfo/nginx>_______________________________________________
>>>>> nginx mailing list
>>>>> nginx at nginx.org <mailto:nginx at nginx.org>
>>>>> http://mailman.nginx.org/mailman/listinfo/nginx <http://mailman.nginx.org/mailman/listinfo/nginx>
>>>> 
>>>> 
>>>> _______________________________________________
>>>> nginx mailing list
>>>> nginx at nginx.org <mailto:nginx at nginx.org>
>>>> http://mailman.nginx.org/mailman/listinfo/nginx <http://mailman.nginx.org/mailman/listinfo/nginx>
>>> _______________________________________________
>>> nginx mailing list
>>> nginx at nginx.org <mailto:nginx at nginx.org>
>>> http://mailman.nginx.org/mailman/listinfo/nginx <http://mailman.nginx.org/mailman/listinfo/nginx>
>> 
>> _______________________________________________
>> nginx mailing list
>> nginx at nginx.org <mailto:nginx at nginx.org>
>> http://mailman.nginx.org/mailman/listinfo/nginx <http://mailman.nginx.org/mailman/listinfo/nginx>
> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20170709/876d29c7/attachment-0001.html>