Avoiding Nginx restart when rsyncing cache across machines

Thu Sep 13 06:39:32 UTC 2018

> The cache is pretty big and I want to limit unnecessary requests if I can.

30gb of cache and ~ 400k hits isn’t a lot.

> Cloudflare is in front of my machines and I pay for load balancing, firewall, Argo among others. So there is a cost per request.

Doesn’t matter if you pay for load balancing, firewall, argo etc – implementing a secondary caching layer won’t increase your costs on the CloudFlare side of things, because you’re not communicating via CloudFlare but rather between machines – you’d connect your X amount of locations to a smaller amount of locations, doing direct traffic between your DigitalOcean instances – so no CloudFlare costs involved.

Communication between your CDN servers and your origin server also (IMO) shouldn’t go via any CloudFlare related products, so additional hits on the origin will be “free” in the expense of a bit higher load – however since it would be only a subset of locations that would request via the origin, and they then serve as the origin for your other servers – you’re effectively decreasing the origin traffic.

You should easily be able to get a 97-99% offload of your origin (in my own setup, it’s at 99.95% at this point), even without using a secondary layer, and performance can get improved by using stuff such as:

http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_background_update

http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_use_stale_updating 

Nginx is smart enough to do a sub-request in the background to check if the origin request updated (using modified or etags e.g) – this way the origin communication would be little anyway.

The only Load Balancer / Argo / Firewall costs you should have is the “CDN Server -> end user” traffic, and that won’t increase or decrease by doing a normal proxy_cache setup or a setup with a secondary cache layer.

You also won’t increase costs by doing a warmup of your CDN servers – you could do something as simple as:

curl -o /dev/null -k -I --resolve cdn.yourdomain.com:80:127.0.0.1 https://cdn.yourdomain.com/img/logo.png 

You could do the same with python or another language if you’re feeling more comfortable there.

However using a method like above, will result in your warmup being kept “local”, since you’re resolving the cdn.yourdomain.com to localhost, requests that are not yet cached will use whatever is configured in your proxy_pass in the nginx config.

> Admittedly I have a not so complex cache architecture. i.e. all cache machines in front of the origin and it has worked so far

I would say it’s complex if you have to sync your content – many pull based CDN’s simply do a normal proxy_cache + proxy_pass setup, not syncing content, and then using some of the nifty features (such as proxy_cache_background_update and proxy_cache_use_stale_updating) to decrease the origin traffic, or possibly implementing a secondary layer if they’re still doing a lot of origin traffic (e.g. because of having a lot of “edge servers”) – if you’re like 10 servers, I wouldn’t even consider a secondary layer unless your origin is under heavy load and can’t handle 10 possible clients (CDN Servers).

Best Regards,
Lucas Rolff