Debugging Nginx Cache Misses: Hitting high number of MISS despite high proxy valid

Peter Booth peter_booth at me.com
Mon May 14 15:07:47 UTC 2018


Quintin,

I dont know anything about your context, but your setup looks over simplistic. Here are some things that I learned 
painfully over a few years of supporting a high traffic retail website

1. Is this a website that's on the internet, and thus exposed to random queries from bots and scrapers that you can’t control?

2. For your cache misses, how long best case, typical and worse case does your back-end take to build the pages?

3. You need to log everything that could feasibly affect the status of the site.  For example, here’s a log config urationfrom one gnarly site that I worked on:

    log_format main '$http_x_forwarded_for $http_true_client_ip $remote_addr - $remote_user [$time_local] $host "$request" '
                      '$status $body_bytes_sent $upstream_cache_status $cookie_jsessionid $http_akamai_country $cookie_e4x_country $cookie_e4x_currency "$http_referer" '
                      '"$http_user_agent" "$request_time”’;

4. the first problem is your cache key, and that it includes $request_uri which is the original uri
including all arguments. So you are already exposed to DOS requests that could be unintentional,
as anyone can bust your cache by adding an extra parameter.

>  proxy_cache_key "$scheme://$host$request_uri$do_not_cache";


5. Not caching requests from logged in users is a very blunt tool. Is this a site where only administrative users are logged in?

Imagine a retail site that sells clothing. It’s possible that a dynamic page that lists all the red dresses is something 
a logged in user sees. Perhaps the page can be cached ? But if there is a version of the page that shows 30 entries and other 
that shows 60 then they need to disambiguated by the cache key.  Perhaps users can choose to see prices in Euro instead of USD?
Then this also belongs in the key. If I am an American vacationing in Pari s then perhaps the default behavior should be to show me
 Euro prices, based n the value of a cookie that the CDN sets. In the situation the customer may want to override this default behavior 
and insist he sees USD prices. You can see how complex this can get. 

7. The default behavior is to not cache responses that contain a set-cookie - imagine how cache pollution - sending someone another person’s personal data stored in a cookie could be much worse than a cache miss. But there are also settings where your backend is some legacy software that you dont control
and the correct behavior isn’t to not cache but instead to remove the set-cookie from the response and cache the response without it.

8 How you prime the cache , monitor the cache, and clear the cache are crucial . Perhaps you have a script that uses curl or wget to retrieve a series of pages from your site. If the script is written naively then each step might cause a new servlet session to be created on the backend producing a memory issue. 

9.  script is very useful to track the health of your cache:

https://github.com/perusio/nginx-cache-inspector <https://github.com/perusio/nginx-cache-inspector>

10. The if directive in nginx has some issues  (see https://www.nginx.com/resources/wiki/start/topics/depth/ifisevil/ <https://www.nginx.com/resources/wiki/start/topics/depth/ifisevil/> )
When I need to use complex configuration logic I use OpenResty. OpenResty is a bundle that 
combines the standard nginx with some additional lua modules. It’s still standard nginx -
 not forked or customized in any way.

11.

A very cut down version of a cache config for one page follows:

# Product arrays get cached
        location ~ /shop/ {
            rewrite "/(.*)/2];ord.*$" $1 ;
            proxy_no_cache $arg_mid $arg_siteID;
            proxy_cache_bypass $arg_mid $arg_siteID;
            proxy_cache_use_stale updating;
            default_type text/html;
            proxy_cache_valid 200 302 301 15m;
            proxy_ignore_headers Set-Cookie Cache-Control; 
            proxy_pass_header off;
            proxy_hide_header Set-Cookie;
            expires 900s;
            add_header  Last-Modified "";
            add_header  ETag "";            
            # Build cache key            
            set $e4x_currency $cookie_e4x_currency;
            set_if_empty $e4x_currency 'USD';
            set $num_items $cookie_EndecaNumberOfItems;
            set_if_empty $num_items 'LOW';           
            proxy_cache_key "$uri|$e4x_currency|$num_items";
            proxy_cache product_arrays;            
            # Add Canonical URL string
            set $folder_id $arg_FOLDER%3C%3Efolder_id;
            set $canonical_url "http://$http_host$uri";
            add_header Link "<$canonical_url>; rel=\"canonical\"";
            proxy_pass http://apache$request_uri;
        }


Tis snippet shows a key made of three parts. The real version has seven parts.

Good luck!

Peter


> On 14 May 2018, at 12:06 AM, Quintin Par <quintinpar at gmail.com> wrote:
> 
> 
> Thanks all for the response. Michael, I am going to add those header ignores.
>  
> Still puzzled by the large number of MISSEs and I’ve no clue why they are happening. Leads appreciated.
>  
>  
> 
> 
> - Quintin
> 
> On Sun, May 13, 2018 at 6:12 PM, c0nw0nk <nginx-forum at forum.nginx.org <mailto:nginx-forum at forum.nginx.org>> wrote:
> You know you can DoS sites with Cache MISS via switching up URL params and
> arguements.
> 
> Examples :
> 
> HIT :
> index.php?var1=one&var2=two
> MISS :
> index.php?var2=two&var1=one
> 
> MISS :
> index.php?random=1
> index.php?random=2
> index.php?random=3
> etc etc
> 
> Inserting random arguements to URL's will cause cache misses and changing
> the order of existing valid URL arguements will also cause misses.
> 
> Cherian Thomas Wrote:
> -------------------------------------------------------
> > Thanks for this Michael.
> > 
> > 
> > 
> > This is so surprising. If someone decides to Dos and crawls the
> > website
> > with a rogue header, this will essentially bypass the cache and put a
> > strain on the website. In fact, I was hit by a dos attack that’s when
> > I
> > started looking at logs and realized the large number of MISSes.
> > 
> > 
> > 
> > Can someone please help?
> > 
> > 
> > - Cherian
> > 
> > On Sat, May 12, 2018 at 12:01 PM, Friscia, Michael
> > <michael.friscia at yale.edu <mailto:michael.friscia at yale.edu>
> > > wrote:
> > 
> > > I'm not sure if this will help, but I ignore/hide a lot, this is in
> > my
> > > config
> > >
> > >
> > > proxy_ignore_headers X-Accel-Expires Expires Cache-Control
> > Set-Cookie;
> > > proxy_hide_header X-Accel-Expires;
> > > proxy_hide_header Pragma;
> > > proxy_hide_header Server;
> > > proxy_hide_header Request-Context;
> > > proxy_hide_header X-Powered-By;
> > > proxy_hide_header X-AspNet-Version;
> > > proxy_hide_header X-AspNetMvc-Version;
> > >
> > >
> > > I have not experienced the problem you mention, I just thought I
> > would
> > > offer my config.
> > >
> > >
> > > ___________________________________________
> > >
> > > Michael Friscia
> > >
> > > Office of Communications
> > >
> > > Yale School of Medicine
> > >
> > > (203) 737-7932 – office
> > >
> > > (203) 931-5381 – mobile
> > >
> > > http://web.yale.edu <https://mailtrack.io/trace/link/a61adbc81bbb4743e50220408108f7e1b8f3af40?url=http%3A%2F%2Fweb.yale.edu&userId=74734&signature=0767ce63378dc575>
> > >
> > <https://mailtrack.io/trace/link/8357a0bdd8c40c2ff5b7d91c7797cbc7a8535 <https://mailtrack.io/trace/link/661443b9951f60c19cd0ed2ec67ca9c38485a127?url=https%3A%2F%2Fmailtrack.io%2Ftrace%2Flink%2F8357a0bdd8c40c2ff5b7d91c7797cbc7a8535&userId=74734&signature=fd94611bb5198158>
> > ffb?url=http%3A%2F%2Fweb.yale.edu <https://mailtrack.io/trace/link/8d2b22d027b9e7af0a2468545c2e35529237af19?url=http%3A%2F%2F2Fweb.yale.edu&userId=74734&signature=5ab2d28a496b50f6>%2F&userId=74734&signature=d652edf1f4
> > f21323>
> > >
> > >
> > > ------------------------------
> > > *From:* nginx <nginx-bounces at nginx.org <mailto:nginx-bounces at nginx.org>> on behalf of Quintin Par <
> > > quintinpar at gmail.com <mailto:quintinpar at gmail.com>>
> > > *Sent:* Saturday, May 12, 2018 1:32 PM
> > > *To:* nginx at nginx.org <mailto:nginx at nginx.org>
> > > *Subject:* Re: Debugging Nginx Cache Misses: Hitting high number of
> > MISS
> > > despite high proxy valid
> > >
> > >
> > > That’s the tricky part. These MISSes are intermittent. Whenever I
> > run curl
> > > I get HITs but I end up seeing a lot of MISS in the logs.
> > >
> > >
> > >
> > > How do I log these MiSSes with the reason? I want to know what
> > headers
> > > ended up bypassing the cache.
> > >
> > >
> > >
> > > Here’s my caching config
> > >
> > >
> > >
> > >             proxy_pass http://127.0.0.1:8000 <https://mailtrack.io/trace/link/071291057b0a07a97c3170df6ceb9706ad0e553d?url=http%3A%2F%2F127.0.0.1%3A8000&userId=74734&signature=21d883fe1973c407>
> > >
> > <https://urldefense.proofpoint.com/v2/url?u=http-3A__127.0.0.1-3A8000& <https://mailtrack.io/trace/link/6864e1b6645eae9d83bd78154bd244cbd3132407?url=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttp-3A__127.0.0.1-3A8000%26&userId=74734&signature=05baa72c55f6e580>
> > d=DwMFaQ&c=cjytLXgP8ixuoHflwc-poQ&r=wvXEDjvtDPcv7AlldT5UvDx32KXBEM6um_
> > lS023SJrs&m=F-qGMOyS74uE8JM-dOLmNH92bQ1xQ-7Rj1d6k-_WST4&s=NHvlb1WColNw
> > TWBF36P1whJdu5iWHK9_50IDHugaEdQ&e=>
> > > ;
> > >
> > >                 proxy_set_header X-Real-IP  $remote_addr;
> > >
> > >                 proxy_set_header X-Forwarded-For
> > > $proxy_add_x_forwarded_for;
> > >
> > >                 proxy_set_header X-Forwarded-Proto https;
> > >
> > >                 proxy_set_header X-Forwarded-Port 443;
> > >
> > >
> > >
> > >                 # If logged in, don't cache.
> > >
> > >                 if ($http_cookie ~*
> > "comment_author_|wordpress_(?!test_cookie)|wp-postpass_"
> > > ) {
> > >
> > >                   set $do_not_cache 1;
> > >
> > >                 }
> > >
> > >                 proxy_cache_key "$scheme://$host$request_uri$
> > > do_not_cache";
> > >
> > >                 proxy_cache staticfilecache;
> > >
> > >                 add_header Cache-Control public;
> > >
> > >                 proxy_cache_valid       200 120d;
> > >
> > >                 proxy_hide_header "Set-Cookie";
> > >
> > >                 proxy_ignore_headers  "Set-Cookie";
> > >
> > >                 proxy_ignore_headers  "Cache-Control";
> > >
> > >                 proxy_hide_header "Cache-Control";
> > >
> > >                 proxy_pass_header X-Accel-Expires;
> > >
> > >
> > >
> > >                 proxy_set_header Accept-Encoding "";
> > >
> > >                 proxy_ignore_headers Expires;
> > >
> > >                 add_header X-Cache-Status $upstream_cache_status;
> > >
> > >                 proxy_cache_use_stale   timeout;
> > >
> > >                 proxy_cache_bypass $arg_nocache $do_not_cache;
> > > - Quintin
> > >
> > >
> > > On Sat, May 12, 2018 at 10:29 AM Lucas Rolff <lucas at lucasrolff.com <mailto:lucas at lucasrolff.com>>
> > wrote:
> > >
> > > It can be as simple as doing a curl to your “origin” url (the one
> > you
> > > proxy_pass to) for the files you see that gets a lot of MISS’s – if
> > there’s
> > > odd headers such as cookies etc, then you’ll most likely experience
> > a bad
> > > cache if your nginx is configured to not ignore those headers.
> > >
> > >
> > >
> > > *From: *nginx <nginx-bounces at nginx.org <mailto:nginx-bounces at nginx.org>> on behalf of Quintin Par <
> > > quintinpar at gmail.com <mailto:quintinpar at gmail.com>>
> > > *Reply-To: *"nginx at nginx.org <mailto:nginx at nginx.org>" <nginx at nginx.org <mailto:nginx at nginx.org>>
> > > *Date: *Saturday, 12 May 2018 at 18.26
> > > *To: *"nginx at nginx.org <mailto:nginx at nginx.org>" <nginx at nginx.org <mailto:nginx at nginx.org>>
> > > *Subject: *Debugging Nginx Cache Misses: Hitting high number of MISS
> > > despite high proxy valid
> > >
> > >
> > >
> > > [image:
> > >
> > https://mailtrack.io/trace/mail/86a613eb1ce46a4e7fa6f9eb96989cddae6398 <https://mailtrack.io/trace/mail/86a613eb1ce46a4e7fa6f9eb96989cddae6398>
> > 00.png?u=74734]
> > >
> > > My proxy cache path is set to a very high size
> > >
> > >
> > >
> > > proxy_cache_path  /var/lib/nginx/cache  levels=1:2
> > >  keys_zone=staticfilecache:180m  max_size=700m;
> > >
> > > and the size used is only
> > >
> > >
> > >
> > > sudo du -sh *
> > >
> > > 14M cache
> > >
> > > 4.0K    proxy
> > >
> > > Proxy cache valid is set to
> > >
> > >
> > >
> > > proxy_cache_valid 200 120d;
> > >
> > > I track HIT and MISS via
> > >
> > >
> > >
> > > add_header X-Cache-Status $upstream_cache_status;
> > >
> > > Despite these settings I am seeing a lot of MISSes. And this is for
> > pages
> > > I intentionally ran a cache warmer an hour ago.
> > >
> > >
> > >
> > > How do I debug why these MISSes are happening? How do I find out if
> > the
> > > miss was due to eviction, expiration, some rogue header etc? Does
> > Nginx
> > > provide commands for this?
> > >
> > >
> > >
> > > - Quintin
> > > _______________________________________________
> > > nginx mailing list
> > > nginx at nginx.org <mailto:nginx at nginx.org>
> > > http://mailman.nginx.org/mailman/listinfo/nginx <https://mailtrack.io/trace/link/956685bf1c269e5b5e505d57769f24a31e3e2442?url=http%3A%2F%2Fmailman.nginx.org%2Fmailman%2Flistinfo%2Fnginx&userId=74734&signature=61a29f8655dde16e>
> > >
> > <https://mailtrack.io/trace/link/122c3dbd333c388f47f5c2776af9ebc3fc75a <https://mailtrack.io/trace/link/0f96ef0fff2b29b47c79cd24c346157878aaf2e5?url=https%3A%2F%2Fmailtrack.io%2Ftrace%2Flink%2F122c3dbd333c388f47f5c2776af9ebc3fc75a&userId=74734&signature=0b1e1864a472eee2>
> > e10?url=https%3A%2F%2Furldefense.proofpoint.com <https://mailtrack.io/trace/link/5a068de37a59a883da6fd59fdd4026a152a7fc91?url=http%3A%2F%2F2Furldefense.proofpoint.com&userId=74734&signature=ca8f6ddc8276a370>%2Fv2%2Furl%3Fu%3Dhttp-
> > 3A__mailman.nginx.org_mailman_listinfo_nginx%26d%3DDwMFaQ%26c%3DcjytLX
> > gP8ixuoHflwc-poQ%26r%3DwvXEDjvtDPcv7AlldT5UvDx32KXBEM6um_lS023SJrs%26m
> > %3DF-qGMOyS74uE8JM-dOLmNH92bQ1xQ-7Rj1d6k-_WST4%26s%3DD3LnZhfobOtlEStCv
> > CDrcwmHydEHaGRFC4gnWvRT5Uk%26e%3D&userId=74734&signature=56c7a7ad18b2c
> > 057>
> > >
> > >
> > > _______________________________________________
> > > nginx mailing list
> > > nginx at nginx.org <mailto:nginx at nginx.org>
> > > http://mailman.nginx.org/mailman/listinfo/nginx <https://mailtrack.io/trace/link/f500ef35fc0275c82402a7af89180ae2c67cea6a?url=http%3A%2F%2Fmailman.nginx.org%2Fmailman%2Flistinfo%2Fnginx&userId=74734&signature=aa7675f47e061eec>
> > >
> > <https://mailtrack.io/trace/link/92c2700d67bd6891ca1606e2df4e0f11c6d82 <https://mailtrack.io/trace/link/d6afed06499ad18204cf041056d4781772869d72?url=https%3A%2F%2Fmailtrack.io%2Ftrace%2Flink%2F92c2700d67bd6891ca1606e2df4e0f11c6d82&userId=74734&signature=59dcf4fe89ac3c3c>
> > 260?url=http%3A%2F%2Fmailman.nginx.org <https://mailtrack.io/trace/link/3ec600220aa90db4d165256c22910f3c97fa118d?url=http%3A%2F%2F2Fmailman.nginx.org&userId=74734&signature=c116773b55639f01>%2Fmailman%2Flistinfo%2Fnginx&us
> > erId=74734&signature=3763121afa828bb7>
> > >
> > _______________________________________________
> > nginx mailing list
> > nginx at nginx.org <mailto:nginx at nginx.org>
> > http://mailman.nginx.org/mailman/listinfo/nginx <https://mailtrack.io/trace/link/8e6777181b5012ff78b980aafec44306b2954bae?url=http%3A%2F%2Fmailman.nginx.org%2Fmailman%2Flistinfo%2Fnginx&userId=74734&signature=2adebca7901eccce>
> 
> Posted at Nginx Forum: https://forum.nginx.org/read.php?2,279764,279771#msg-279771 <https://mailtrack.io/trace/link/89e8f350a5c632ccafaadd90a9a8114ecac2e688?url=https%3A%2F%2Fforum.nginx.org%2Fread.php%3F2%2C279764%2C279771%23msg-279771&userId=74734&signature=3a01022d1b56bd07>
> 
> _______________________________________________
> nginx mailing list
> nginx at nginx.org <mailto:nginx at nginx.org>
> http://mailman.nginx.org/mailman/listinfo/nginx <https://mailtrack.io/trace/link/8e6777181b5012ff78b980aafec44306b2954bae?url=http%3A%2F%2Fmailman.nginx.org%2Fmailman%2Flistinfo%2Fnginx&userId=74734&signature=2adebca7901eccce>
> _______________________________________________
> nginx mailing list
> nginx at nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx/attachments/20180514/35af7f6d/attachment-0001.html>


More information about the nginx mailing list