Cache doesn't expire as should
ruz at sports.ru
Wed Dec 23 16:39:25 UTC 2015
On Wed, Dec 23, 2015 at 6:35 PM, Maxim Dounin <mdounin at mdounin.ru> wrote:
> On Wed, Dec 23, 2015 at 11:57:53AM +0300, Руслан Закиров wrote:
> > On Tue, Dec 22, 2015 at 4:49 PM, Maxim Dounin <mdounin at mdounin.ru>
> > > Hello!
> > >
> > > On Tue, Dec 22, 2015 at 02:43:11PM +0300, Руслан Закиров wrote:
> > >
> > > [...]
> > >
> > > > Our error log has "ignore long locked inactive cache entry" alerts,
> but I
> > > > really couldn't match it to "defreeze" event. Access log has
> > > STALE/UPDATING
> > > > requests between the alert and EXPIRED (cache updating) request.
> > >
> > > The "ignore long locked inactive cache entry" alerts indicate that
> > > a cache entry was locked by some request, and wasn't unlocked for
> > > a long time. The alert is expected to appear if a cache node is
> > > locked for cache inactive time (as set by proxy_cache_path
> > > inactive=, 10 minutes by default).
> > Inactive is defined in the config, but it's set to default 10m.
> > What happens with requests after this time? Do they hit backend and
> > cache? Do they use stale version?
> As long as an entry is locked and not accessed for the inactive
> time, it will cause the alert in question to be logged by cache
> manager. Nothing else happens - that is, the entry is still
> locked, and if it contains information that the entry is currently
> being updated, requests to this entry will return stale content
> with UPDATING status as per proxy_cache_use_stale.
> This alert may in theory happen in normal situation if a backend
> response takes longer than the inactive time set. But in general
> it indicates that there is a problem.
uwsgi on backends has harakiri setting at 9 seconds and 10 seconds timeout
nginx. Index page takes 0.2-0.7 seconds to render.
(Note well that "locked" here should not be confused with
> request-level locks as in proxy_cache_lock.)
> > I'm going to check "long locked" messages in the log to see how many was
> > for "/" location.
> > The hash should be the same if we didn't change cache key, right?
> Any message indicate that there is a problem. If the "/" location
> is requested often, it'll likely won't be mentioned in the "ignore long
> locked" messages.
> > > Most likely reason is that a worker died or was killed
> > > while holding a lock on a cache node (i.e., while a request was
> > > waiting for a new response from a backend).
> > >
> > Shouldn't be there a record in error log? Error log level at warn.
> There should be something like "worker process exited" messages,
> though at least some of them may be on the "notice" level (e.g.,
> if you force nginx worker to quit by sending SIGTERM to it
> > > Trivial things to consider:
> > >
> > > - check logs for segmentation faults;
> > no seg faults in logs
> Any other alerts, crit or emerg messages? Any alerts about
> socket leaks on configuration reloads ("open socket left..."),
> sockets in CLOSED state? How long relevant requests to upstream
> server take (i.e., make sure that the behaviour observed isn't a
> result of an upstream server just being very slow)?
Looks like we have some misconfiguration and I get the following:
2015/12/17 01:49:13 [*crit*] 5232#0: *599273664 rename()
"/var/www/storage/img/ru-cyber/answers/10-5.jpg" failed (13: Permission
denied) while reading upstream, client: 126.96.36.199, server:
www.sports.ru, request: "GET /storage/img/ru-cyber/answers/10-5.jpg
HTTP/1.1", upstream: "
http://192.168.1.10:80/storage/img/ru-cyber/answers/10-5.jpg", host: "
www.sports.ru", referrer: "http://cyber.sports.ru/dota2/1034151992.html"
2015/12/17 01:50:17 [*crit*] 5229#0: *599313395 open()
"/var/www/storage/img/fantasy/shirts/ule/slo.png" failed (13: Permission
denied), client: 192.168.1.10, server: www.sports.ru, request: "GET
/storage/img/fantasy/shirts/ule/slo.png HTTP/1.0", host: "www.sports.ru",
> > - if you are using 3rd party modules / patches, try without them;
> > >
> > from freebsd port, updated gist  with `nginx -V` output
> >  https://gist.github.com/ruz/05456767750715f6b54e
> The only suspicious thing I see there is "ssi on" and
> "proxy_cache_lock" at the same time. There were some fixes
> related to proxy_cache_lock with subrequests in more recent
> > > - make sure you don't kill worker processes yourself or using some
> > > automation scripts (in particular, don't try to terminate old
> > > worker processes after a configuration reload).
> > >
> > One recent appearance of the problem was at 1:30AM and I checked
> > logs for crazy midnight deploys - nothing.
> > Also, we don't use anything custom to restart nginx, just regular
> > management tools.
> Just a regular kill will be enough screw up things. On the other
> hand, it may be an nginx bug as well.
> Note though, that the version you are using is quite old.
> Debugging anything but 1.9.x hardly make sense. You may want to
> try upgrading to the latest nginx-devel port.
As I said before 1.8.0 made things worse. May be we will try nginx 1.9.x on
one of frontends and see how it goes.
> Note well that if a problem happens, it's a good idea to either
> completely restart nginx or do binary upgrade procedure ("service
> nginx upgrade") to make sure shared memory structures are intact.
> Maxim Dounin
> nginx mailing list
> nginx at nginx.org
Руководитель отдела разработки веб-сервисов
+7(916) 597-92-69, ruz @ <http://www.sports.ru/>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the nginx