Cache manager occasionally stops deleting cached files

Fri Feb 19 13:00:54 UTC 2016

Maxim Dounin Wrote:
-------------------------------------------------------
> Hello!

Hello and thanks for the reply!

> > I assume the mentioned error is due to relatively often nginx
> restarts and
> > is benign. There's nothing else in the error log (except for
> occasional
> > upstream timeouts). I'm aware this likely isn't enough info to debug
> the
> > issue, but do you at least have some ideas on what might be causing
> this
> > issue, where to look? I'm wild guessing cache manager waits for some
> lock to
> > be released, but it never gets released so it just waits
> indefinitely. 
> 
> The error logged is due to an entry nginx is going to remove an 
> inactive cache entry but it is locked by some requests.  Unless 
> inactive time is very low (not your case) it indicate a problem 
> somewhere else.
> 
> Such locked entries can't be removed from cache.  Addtitionally, 
> once there are enough such locked entries, nginx won't be able to 
> purge cache based on max_size.  That is, it's expected that nginx 
> will have problems with removing entries from cache if you see 
> such messages.
>
> Most trivial reason for such messages is abnormally killed nginx 
> processes.  That is, if some processes die due to bugs, or killed 
> by an unwary administrator or an incorrect script - the problem 
> will appear sooner or later.

I see. I do have 1000-2000 of such errors in log per day, definitely more
than couple of months ago. I remember server got crashed in the past, but
not recently.

> To further debug things, try the following:
> 
> - restart nginx and record pids of all nginx processes;
> 
> - once the problem starts to appear again, check if there are the 
>   same processes running;
> 
> - if some processes different from one recorded, dig further to 
>   find out why.
> 
> Some trivial things like looking into logs for "worker process 
> exited ..." messages and checking if the problem persists without 
> 3rd party modules compiled in (see "nginx -V") may also help.

Thanks, I'll dig deeper. I do have 3rd party modules and there are
occasional messages such as "worker process exited on signal 11", but they
are rare, i'll try to figure out what causes them, but it'll take time.
However, now that this already happens, is it possible so somehow unlock all
entries and start clean, but without removing all cached content? Or
alternatively, can I delete the locked files manually as a workaround?

Regards,
Vedran

Posted at Nginx Forum: https://forum.nginx.org/read.php?2,264599,264626#msg-264626