Debugging Nginx Cache Misses: Hitting high number of MISS despite high proxy valid

Quintin Par quintinpar at gmail.com
Tue May 15 15:34:33 UTC 2018


Thank you so much for this Peter. Very helpful.



For what it’s worth, I run a static wordpress website. So the configuration
should not be very complicated.



The link that you provided also led me to
https://github.com/perusio/wordpress-nginx
<https://mailtrack.io/trace/link/7155b729fa7169e53929c22c9c7a4e8e270c80ae?url=https%3A%2F%2Fgithub.com%2Fperusio%2Fwordpress-nginx&userId=74734&signature=0f3ba7c5b91784ea>



To answer your queries:



>1. Is this a website that's on the internet, and thus exposed to random
queries from bots and scrapers that you can’t control?

Yes and a lot of scammy attacks typical to all wordpress websites. I’ve
enabled connection limiting and request limiting of wordpress along with
fail2ban on the request limiting rule.



> 2. For your cache misses, how long best case, typical and worse case does
your back-end take to build the pages?

I run a warmer script and I expect all the pages to stay there 120 days.
This is run every week and takes 1 hour.



4. Instead of $request_uri what’s the right variable that excludes all
parameters? Is it $uri?



> 9.  script is very useful to track the health of your cache:

Thank you for this.



Based on your response my suspicion is that url params might be the culprit
here. But I wish there was a way to diagnostically get to the root cause.
Do you know of any param/variable I can log to access log for this?



- Quintin



On Mon, May 14, 2018 at 11:08 AM Peter Booth <peter_booth at me.com> wrote:

>
> Quintin,
>
> I dont know anything about your context, but your setup looks over
> simplistic. Here are some things that I learned
> painfully over a few years of supporting a high traffic retail website
>
> 1. Is this a website that's on the internet, and thus exposed to random
> queries from bots and scrapers that you can’t control?
>
> 2. For your cache misses, how long best case, typical and worse case does
> your back-end take to build the pages?
>
> 3. You need to log everything that could feasibly affect the status of the
> site.  For example, here’s a log config urationfrom one gnarly site that I
> worked on:
>
>     log_format main '$http_x_forwarded_for $http_true_client_ip
> $remote_addr - $remote_user [$time_local] $host "$request" '
>                       '$status $body_bytes_sent $upstream_cache_status
> $cookie_jsessionid $http_akamai_country $cookie_e4x_country
> $cookie_e4x_currency "$http_referer" '
>                       '"$http_user_agent" "$request_time”’;
>
> 4. the first problem is your cache key, and that it includes $request_uri
> which is the original uri
> * including all arguments. *So you are already exposed to DOS requests
> that could be unintentional,
> as anyone can bust your cache by adding an extra parameter.
>
>  proxy_cache_key "$scheme://$host$request_uri$do_not_cache";
>>
>
> 5. Not caching requests from logged in users is a very blunt tool. Is this
> a site where only administrative users are logged in?
>
> Imagine a retail site that sells clothing. It’s possible that a dynamic
> page that lists all the red dresses is something
> a logged in user sees. Perhaps the page can be cached ? But if there is a
> version of the page that shows 30 entries and other
> that shows 60 then they need to disambiguated by the cache key.  Perhaps
> users can choose to see prices in Euro instead of USD?
> Then this also belongs in the key. If I am an American vacationing in Pari
> s then perhaps the default behavior should be to show me
>  Euro prices, based n the value of a cookie that the CDN sets. In the
> situation the customer may want to override this default behavior
> and insist he sees USD prices. You can see how complex this can get.
>
> 7. The default behavior is to not cache responses that contain a
> set-cookie - imagine how cache pollution - sending someone another person’s
> personal data stored in a cookie could be much worse than a cache miss. But
> there are also settings where your backend is some legacy software that you
> dont control
> and the correct behavior isn’t to not cache but instead to remove the
> set-cookie from the response and cache the response without it.
>
> 8 How you prime the cache , monitor the cache, and clear the cache are
> crucial . Perhaps you have a script that uses curl or wget to retrieve a
> series of pages from your site. If the script is written naively then each
> step might cause a new servlet session to be created on the backend
> producing a memory issue.
>
> 9.  script is very useful to track the health of your cache:
>
> https://github.com/perusio/nginx-cache-inspector
>
> 10. The if directive in nginx has some issues  (see
> https://www.nginx.com/resources/wiki/start/topics/depth/ifisevil/ )
> When I need to use complex configuration logic I use OpenResty. OpenResty
> is a bundle that
> combines the standard nginx with some additional lua modules. It’s still
> standard nginx -
>  not forked or customized in any way.
>
> 11.
>
> A very cut down version of a cache config for one page follows:
>
> # Product arrays get cached
>         location ~ /shop/ {
>             rewrite "/(.*)/2];ord.*$" $1 ;
>             proxy_no_cache $arg_mid $arg_siteID;
>             proxy_cache_bypass $arg_mid $arg_siteID;
>             proxy_cache_use_stale updating;
>             default_type text/html;
>             proxy_cache_valid 200 302 301 15m;
>             proxy_ignore_headers Set-Cookie Cache-Control;
>             proxy_pass_header off;
>             proxy_hide_header Set-Cookie;
>             expires 900s;
>             add_header  Last-Modified "";
>             add_header  ETag "";
>             # Build cache key
>             set $e4x_currency $cookie_e4x_currency;
>             set_if_empty $e4x_currency 'USD';
>             set $num_items $cookie_EndecaNumberOfItems;
>             set_if_empty $num_items 'LOW';
>             proxy_cache_key "$uri|$e4x_currency|$num_items";
>             proxy_cache product_arrays;
>             # Add Canonical URL string
>             set $folder_id $arg_FOLDER%3C%3Efolder_id;
>             set $canonical_url "http://$http_host$uri";
>             add_header Link "<$canonical_url>; rel=\"canonical\"";
>             proxy_pass http://apache$request_uri;
>         }
>
>
> Tis snippet shows a key made of three parts. The real version has seven
> parts.
>
> Good luck!
>
> Peter
>
>
> On 14 May 2018, at 12:06 AM, Quintin Par <quintinpar at gmail.com> wrote:
>
> Thanks all for the response. Michael, I am going to add those header
> ignores.
>
>
> Still puzzled by the large number of MISSEs and I’ve no clue why they are
> happening. Leads appreciated.
>
>
>
>
>
>
> - Quintin
>
> On Sun, May 13, 2018 at 6:12 PM, c0nw0nk <nginx-forum at forum.nginx.org>
> wrote:
>
>> You know you can DoS sites with Cache MISS via switching up URL params and
>> arguements.
>>
>> Examples :
>>
>> HIT :
>> index.php?var1=one&var2=two
>> MISS :
>> index.php?var2=two&var1=one
>>
>> MISS :
>> index.php?random=1
>> index.php?random=2
>> index.php?random=3
>> etc etc
>>
>> Inserting random arguements to URL's will cause cache misses and changing
>> the order of existing valid URL arguements will also cause misses.
>>
>> Cherian Thomas Wrote:
>> -------------------------------------------------------
>> > Thanks for this Michael.
>> >
>> >
>> >
>> > This is so surprising. If someone decides to Dos and crawls the
>> > website
>> > with a rogue header, this will essentially bypass the cache and put a
>> > strain on the website. In fact, I was hit by a dos attack that’s when
>> > I
>> > started looking at logs and realized the large number of MISSes.
>> >
>> >
>> >
>> > Can someone please help?
>> >
>> >
>> >
>> - Quintin
>>
>> >
>> > On Sat, May 12, 2018 at 12:01 PM, Friscia, Michael
>> > <michael.friscia at yale.edu
>> > > wrote:
>> >
>> > > I'm not sure if this will help, but I ignore/hide a lot, this is in
>> > my
>> > > config
>> > >
>> > >
>> > > proxy_ignore_headers X-Accel-Expires Expires Cache-Control
>> > Set-Cookie;
>> > > proxy_hide_header X-Accel-Expires;
>> > > proxy_hide_header Pragma;
>> > > proxy_hide_header Server;
>> > > proxy_hide_header Request-Context;
>> > > proxy_hide_header X-Powered-By;
>> > > proxy_hide_header X-AspNet-Version;
>> > > proxy_hide_header X-AspNetMvc-Version;
>> > >
>> > >
>> > > I have not experienced the problem you mention, I just thought I
>> > would
>> > > offer my config.
>> > >
>> > >
>> > > ___________________________________________
>> > >
>> > > Michael Friscia
>> > >
>> > > Office of Communications
>> > >
>> > > Yale School of Medicine
>> > >
>> > > (203) 737-7932 – office
>> > >
>> > > (203) 931-5381 – mobile
>> > >
>> > > http://web.yale.edu
>> <https://mailtrack.io/trace/link/a61adbc81bbb4743e50220408108f7e1b8f3af40?url=http%3A%2F%2Fweb.yale.edu&userId=74734&signature=0767ce63378dc575>
>> > >
>> > <https://mailtrack.io/trace/link/8357a0bdd8c40c2ff5b7d91c7797cbc7a8535
>> <https://mailtrack.io/trace/link/661443b9951f60c19cd0ed2ec67ca9c38485a127?url=https%3A%2F%2Fmailtrack.io%2Ftrace%2Flink%2F8357a0bdd8c40c2ff5b7d91c7797cbc7a8535&userId=74734&signature=fd94611bb5198158>
>> > ffb?url=http%3A%2F%2Fweb.yale.edu
>> <https://mailtrack.io/trace/link/8d2b22d027b9e7af0a2468545c2e35529237af19?url=http%3A%2F%2F2Fweb.yale.edu&userId=74734&signature=5ab2d28a496b50f6>
>> %2F&userId=74734&signature=d652edf1f4
>> > f21323>
>> > >
>> > >
>> > > ------------------------------
>> > > *From:* nginx <nginx-bounces at nginx.org> on behalf of Quintin Par <
>> > > quintinpar at gmail.com>
>> > > *Sent:* Saturday, May 12, 2018 1:32 PM
>> > > *To:* nginx at nginx.org
>> > > *Subject:* Re: Debugging Nginx Cache Misses: Hitting high number of
>> > MISS
>> > > despite high proxy valid
>> > >
>> > >
>> > > That’s the tricky part. These MISSes are intermittent. Whenever I
>> > run curl
>> > > I get HITs but I end up seeing a lot of MISS in the logs.
>> > >
>> > >
>> > >
>> > > How do I log these MiSSes with the reason? I want to know what
>> > headers
>> > > ended up bypassing the cache.
>> > >
>> > >
>> > >
>> > > Here’s my caching config
>> > >
>> > >
>> > >
>> > >             proxy_pass http://127.0.0.1:8000
>> <https://mailtrack.io/trace/link/071291057b0a07a97c3170df6ceb9706ad0e553d?url=http%3A%2F%2F127.0.0.1%3A8000&userId=74734&signature=21d883fe1973c407>
>> > >
>> > <https://urldefense.proofpoint.com/v2/url?u=http-3A__127.0.0.1-3A8000&
>> <https://mailtrack.io/trace/link/6864e1b6645eae9d83bd78154bd244cbd3132407?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttp-3A__127.0.0.1-3A8000%26&userId=74734&signature=05baa72c55f6e580>
>> > d=DwMFaQ&c=cjytLXgP8ixuoHflwc-poQ&r=wvXEDjvtDPcv7AlldT5UvDx32KXBEM6um_
>> > lS023SJrs&m=F-qGMOyS74uE8JM-dOLmNH92bQ1xQ-7Rj1d6k-_WST4&s=NHvlb1WColNw
>> > TWBF36P1whJdu5iWHK9_50IDHugaEdQ&e=>
>> > > ;
>> > >
>> > >                 proxy_set_header X-Real-IP  $remote_addr;
>> > >
>> > >                 proxy_set_header X-Forwarded-For
>> > > $proxy_add_x_forwarded_for;
>> > >
>> > >                 proxy_set_header X-Forwarded-Proto https;
>> > >
>> > >                 proxy_set_header X-Forwarded-Port 443;
>> > >
>> > >
>> > >
>> > >                 # If logged in, don't cache.
>> > >
>> > >                 if ($http_cookie ~*
>> > "comment_author_|wordpress_(?!test_cookie)|wp-postpass_"
>> > > ) {
>> > >
>> > >                   set $do_not_cache 1;
>> > >
>> > >                 }
>> > >
>> > >                 proxy_cache_key "$scheme://$host$request_uri$
>> > > do_not_cache";
>> > >
>> > >                 proxy_cache staticfilecache;
>> > >
>> > >                 add_header Cache-Control public;
>> > >
>> > >                 proxy_cache_valid       200 120d;
>> > >
>> > >                 proxy_hide_header "Set-Cookie";
>> > >
>> > >                 proxy_ignore_headers  "Set-Cookie";
>> > >
>> > >                 proxy_ignore_headers  "Cache-Control";
>> > >
>> > >                 proxy_hide_header "Cache-Control";
>> > >
>> > >                 proxy_pass_header X-Accel-Expires;
>> > >
>> > >
>> > >
>> > >                 proxy_set_header Accept-Encoding "";
>> > >
>> > >                 proxy_ignore_headers Expires;
>> > >
>> > >                 add_header X-Cache-Status $upstream_cache_status;
>> > >
>> > >                 proxy_cache_use_stale   timeout;
>> > >
>> > >                 proxy_cache_bypass $arg_nocache $do_not_cache;
>> > > - Quintin
>> > >
>> > >
>> > > On Sat, May 12, 2018 at 10:29 AM Lucas Rolff <lucas at lucasrolff.com>
>> > wrote:
>> > >
>> > > It can be as simple as doing a curl to your “origin” url (the one
>> > you
>> > > proxy_pass to) for the files you see that gets a lot of MISS’s – if
>> > there’s
>> > > odd headers such as cookies etc, then you’ll most likely experience
>> > a bad
>> > > cache if your nginx is configured to not ignore those headers.
>> > >
>> > >
>> > >
>> > > *From: *nginx <nginx-bounces at nginx.org> on behalf of Quintin Par <
>> > > quintinpar at gmail.com>
>> > > *Reply-To: *"nginx at nginx.org" <nginx at nginx.org>
>> > > *Date: *Saturday, 12 May 2018 at 18.26
>> > > *To: *"nginx at nginx.org" <nginx at nginx.org>
>> > > *Subject: *Debugging Nginx Cache Misses: Hitting high number of MISS
>> > > despite high proxy valid
>> > >
>> > >
>> > >
>> > > [image:
>> > >
>> > https://mailtrack.io/trace/mail/86a613eb1ce46a4e7fa6f9eb96989cddae6398
>> > 00.png?u=74734]
>> > >
>> > > My proxy cache path is set to a very high size
>> > >
>> > >
>> > >
>> > > proxy_cache_path  /var/lib/nginx/cache  levels=1:2
>> > >  keys_zone=staticfilecache:180m  max_size=700m;
>> > >
>> > > and the size used is only
>> > >
>> > >
>> > >
>> > > sudo du -sh *
>> > >
>> > > 14M cache
>> > >
>> > > 4.0K    proxy
>> > >
>> > > Proxy cache valid is set to
>> > >
>> > >
>> > >
>> > > proxy_cache_valid 200 120d;
>> > >
>> > > I track HIT and MISS via
>> > >
>> > >
>> > >
>> > > add_header X-Cache-Status $upstream_cache_status;
>> > >
>> > > Despite these settings I am seeing a lot of MISSes. And this is for
>> > pages
>> > > I intentionally ran a cache warmer an hour ago.
>> > >
>> > >
>> > >
>> > > How do I debug why these MISSes are happening? How do I find out if
>> > the
>> > > miss was due to eviction, expiration, some rogue header etc? Does
>> > Nginx
>> > > provide commands for this?
>> > >
>> > >
>> > >
>> > > - Quintin
>> > > _______________________________________________
>> > > nginx mailing list
>> > > nginx at nginx.org
>> > > http://mailman.nginx.org/mailman/listinfo/nginx
>> <https://mailtrack.io/trace/link/956685bf1c269e5b5e505d57769f24a31e3e2442?url=http%3A%2F%2Fmailman.nginx.org%2Fmailman%2Flistinfo%2Fnginx&userId=74734&signature=61a29f8655dde16e>
>> > >
>> > <https://mailtrack.io/trace/link/122c3dbd333c388f47f5c2776af9ebc3fc75a
>> <https://mailtrack.io/trace/link/0f96ef0fff2b29b47c79cd24c346157878aaf2e5?url=https%3A%2F%2Fmailtrack.io%2Ftrace%2Flink%2F122c3dbd333c388f47f5c2776af9ebc3fc75a&userId=74734&signature=0b1e1864a472eee2>
>> > e10?url=https%3A%2F%2Furldefense.proofpoint.com
>> <https://mailtrack.io/trace/link/5a068de37a59a883da6fd59fdd4026a152a7fc91?url=http%3A%2F%2F2Furldefense.proofpoint.com&userId=74734&signature=ca8f6ddc8276a370>
>> %2Fv2%2Furl%3Fu%3Dhttp-
>> > 3A__mailman.nginx.org_mailman_listinfo_nginx%26d%3DDwMFaQ%26c%3DcjytLX
>> > gP8ixuoHflwc-poQ%26r%3DwvXEDjvtDPcv7AlldT5UvDx32KXBEM6um_lS023SJrs%26m
>> > %3DF-qGMOyS74uE8JM-dOLmNH92bQ1xQ-7Rj1d6k-_WST4%26s%3DD3LnZhfobOtlEStCv
>> > CDrcwmHydEHaGRFC4gnWvRT5Uk%26e%3D&userId=74734&signature=56c7a7ad18b2c
>> > 057>
>> > >
>> > >
>> > > _______________________________________________
>> > > nginx mailing list
>> > > nginx at nginx.org
>> > > http://mailman.nginx.org/mailman/listinfo/nginx
>> <https://mailtrack.io/trace/link/f500ef35fc0275c82402a7af89180ae2c67cea6a?url=http%3A%2F%2Fmailman.nginx.org%2Fmailman%2Flistinfo%2Fnginx&userId=74734&signature=aa7675f47e061eec>
>> > >
>> > <https://mailtrack.io/trace/link/92c2700d67bd6891ca1606e2df4e0f11c6d82
>> <https://mailtrack.io/trace/link/d6afed06499ad18204cf041056d4781772869d72?url=https%3A%2F%2Fmailtrack.io%2Ftrace%2Flink%2F92c2700d67bd6891ca1606e2df4e0f11c6d82&userId=74734&signature=59dcf4fe89ac3c3c>
>> > 260?url=http%3A%2F%2Fmailman.nginx.org
>> <https://mailtrack.io/trace/link/3ec600220aa90db4d165256c22910f3c97fa118d?url=http%3A%2F%2F2Fmailman.nginx.org&userId=74734&signature=c116773b55639f01>
>> %2Fmailman%2Flistinfo%2Fnginx&us
>> > erId=74734&signature=3763121afa828bb7>
>> > >
>> > _______________________________________________
>> > nginx mailing list
>> > nginx at nginx.org
>> > http://mailman.nginx.org/mailman/listinfo/nginx
>> <https://mailtrack.io/trace/link/8e6777181b5012ff78b980aafec44306b2954bae?url=http%3A%2F%2Fmailman.nginx.org%2Fmailman%2Flistinfo%2Fnginx&userId=74734&signature=2adebca7901eccce>
>>
>> Posted at Nginx Forum:
>> https://forum.nginx.org/read.php?2,279764,279771#msg-279771
>> <https://mailtrack.io/trace/link/89e8f350a5c632ccafaadd90a9a8114ecac2e688?url=https%3A%2F%2Fforum.nginx.org%2Fread.php%3F2%2C279764%2C279771%23msg-279771&userId=74734&signature=3a01022d1b56bd07>
>>
>> _______________________________________________
>> nginx mailing list
>> nginx at nginx.org
>> http://mailman.nginx.org/mailman/listinfo/nginx
>> <https://mailtrack.io/trace/link/8e6777181b5012ff78b980aafec44306b2954bae?url=http%3A%2F%2Fmailman.nginx.org%2Fmailman%2Flistinfo%2Fnginx&userId=74734&signature=2adebca7901eccce>
>>
>
> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx
>
>
> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20180515/88e2a792/attachment-0001.html>


More information about the nginx mailing list