Cache manager occasionally stops deleting cached files
Maxim Dounin
mdounin at mdounin.ru
Thu Feb 18 17:11:18 UTC 2016
Hello!
On Thu, Feb 18, 2016 at 11:20:55AM -0500, vedranf wrote:
> Hello,
>
> I'm having an issue where nginx (1.8) cache manager suddenly just stops
> deleting content thus the disk soon ends up being full until I restart it by
> hand. After it is restarted, it works normally for a couple of days, but
> then it happens again. Cache has some 30-40k files, nothing huge. Relevant
> config lines are:
>
> proxy_cache_path /home/cache/ levels=2:2 keys_zone=cache:25m
> inactive=7d max_size=2705g use_temp_path=on;
> proxy_temp_path /dev/shm/temp; # reduces parallel writes on the
> disk
> proxy_cache_lock on;
> proxy_cache_lock_age 10s;
> proxy_cache_lock_timeout 30s;
> proxy_ignore_client_abort on;
>
> Server gets roughly 100 rps and normally cache manager deletes a couple of
> files every few seconds, however when it gets stuck this is all it does for
> 20-30 minutes or more, i.e. there are 0 unlinks (until I restart it and it
> rereads the on-disk cache):
>
> ...
> epoll_wait(14, {}, 512, 1000) = 0
> epoll_wait(14, {}, 512, 1000) = 0
> epoll_wait(14, {}, 512, 1000) = 0
> epoll_wait(14, {}, 512, 1000) = 0
> gettid() = 11303
> write(24, "2016/02/18 08:22:02 [alert] 11303#11303: ignore long locked
> inactive cache entry 380d3f178017bcd92877ee322b006bbb, count:1\n", 123) =
> 123
> gettid() = 11303
> write(24, "2016/02/18 08:22:02 [alert] 11303#11303: ignore long locked
> inactive cache entry 7b9239693906e791375a214c7e36af8e, count:24\n", 124) =
> 124
> epoll_wait(14, {}, 512, 1000) = 0
> ...
>
> I assume the mentioned error is due to relatively often nginx restarts and
> is benign. There's nothing else in the error log (except for occasional
> upstream timeouts). I'm aware this likely isn't enough info to debug the
> issue, but do you at least have some ideas on what might be causing this
> issue, where to look? I'm wild guessing cache manager waits for some lock to
> be released, but it never gets released so it just waits indefinitely.
The error logged is due to an entry nginx is going to remove an
inactive cache entry but it is locked by some requests. Unless
inactive time is very low (not your case) it indicate a problem
somewhere else.
Such locked entries can't be removed from cache. Addtitionally,
once there are enough such locked entries, nginx won't be able to
purge cache based on max_size. That is, it's expected that nginx
will have problems with removing entries from cache if you see
such messages.
Most trivial reason for such messages is abnormally killed nginx
processes. That is, if some processes die due to bugs, or killed
by an unwary administrator or an incorrect script - the problem
will appear sooner or later.
To further debug things, try the following:
- restart nginx and record pids of all nginx processes;
- once the problem starts to appear again, check if there are the
same processes running;
- if some processes different from one recorded, dig further to
find out why.
Some trivial things like looking into logs for "worker process
exited ..." messages and checking if the problem persists without
3rd party modules compiled in (see "nginx -V") may also help.
--
Maxim Dounin
http://nginx.org/
More information about the nginx
mailing list