STALE responses taking as much as MISS responses

Tue Feb 12 20:09:34 UTC 2019

> after applying tcp_nopush off, the test that we have in place is working as expected. The problem is that this improvement is not happening on production.
Our production environment is mainly a CDN -> NGinx -> Origin. We want to use Nginx in order to control the eviction time of the content (our use case needs a long stale-while-revalidate time and CDN priorizes fresh content instead of stale). Our CDN give us the latency of our NGINX and after apply the change, we are not able to see any improvement. We have decided to put an ELB in front of Nginx, just to have another way to measure, and we can see the same behaviour.

In case of CDN -> nginx -> Origin does the latency appear also for HIT queries or only STALE?

If you take out nginx and use CDN -> Origin what latency do you see then? 

Obviously if your CDN doesn't cache anything then all those will be like MISS requests, but maybe you can identify/measure if the CDN itself doesn't add noticeable extra time (depending on the setup - like if there is SSL offloading (and on which end it happens). Some CDNs also use nginx as edge servers (for example CloudFlare used to have a modified version) so maybe in some configurations the usage of TCP_CORK is still in effect.

> On the other hand, we saw that $request_time, when STALE, is the time to refresh the cache, not the time to return the STALE content. 
> Could somebody confirm this? Which could be the metric to measure the real "latency" from the user point of view (in our case CDN)?

$request_time represents  "time elapsed between the first bytes were read from the client and the log write after the last bytes were sent to the client". 
So it will show the time between the request coming from the CDN edge server and served back to it. In case the object is fetched from backend/updated synchronously it will include the time spent on that.

For more detailed picture you could also log $upstream_connect_time and $upstream_response_time to see how long it takes for nginx to get the response from backend and then compare with the timings you get on the client (curl) when requesting via CDN or directly.

rr