[crit] 16665#0 unlink()

Mon May 6 13:01:45 UTC 2013

On 05/05/13 16:32, Maxim Dounin wrote:
> Hello!
>
> On Sat, May 04, 2013 at 07:08:55PM -0400, Jim Ohlstein wrote:
>
> [...]
>
>> I have just seen a similar situation using fastcgi cache. In my case
>> I am using the same cache (but only one cache) for several
>> server/location blocks. The system is a fairly basic nginx set up
>> with four upstream fastcgi servers and ip hash. The returned content
>> is cached locally by nginx. The cache is rather large but I wouldn't
>> think this would be the cause.
>
> [...]
>
>>      fastcgi_cache_path /var/nginx/fcgi_cache levels=1:2
>> keys_zone=one:512m max_size=250g inactive=24h;
>
> [...]
>
>> The other sever/location blocks are pretty much identical insofar as
>> fastcgi and cache are concerned.
>>
>> When I upgraded nginx using the "on the fly" binary upgrade method,
>> I saw almost 400,000 lines in the error log that looked like this:
>>
>> 2013/05/04 17:54:25 [crit] 65304#0: unlink()
>> "/var/nginx/fcgi_cache/7/2e/899bc269a74afe6e0ad574eacde4e2e7" failed
>> (2: No such file or directory)
>
> [...]
>
> After binary upgrade there are two cache zones - one in old nginx,
> and another one in new nginx (much like in originally posted
> configuration).  This may cause such errors if e.g. a cache file
> is removed by old nginx, and new nginx fails to remove the file
> shortly after.
>
> The 400k lines is a bit too many though.  You may want to check
> that the cache wasn't just removed by some (package?) script
> during the upgrade process.  Alternatively, it might indicate that
> you let old and new processes to coexist for a long time.

I hadn't considered that there are two zones during that short time. 
Thanks for pointing that out.

To my knowledge, there are no scripts or packages which remove files 
from the cache, or the entire cache. A couple of minutes after this 
occurred there were a bit under 1.4 million items in the cache and it 
was "full" at 250 GB. I did look in a few sub-directories at the time, 
and most of the items were time stamped from before this started so 
clearly the entire cache was not removed. During the time period these 
entries were made in the error log, and in the two minutes after, access 
log entries show the expected ratio of "HIT" and "MISS" entries which 
further supports your point below that these are harmless (although I 
don't really believe that I have a cause).

I'm not sure what you mean by a "long time" but all of these entries are 
time stamped over over roughly two and a half minutes.

>
> On the other hand, as discussed many times - such errors are more
> or less harmless as soon as it's clear what caused cache files to
> be removed.  At worst they indicate that information in a cache
> zone isn't correct and max_size might not be maintained properly,
> and eventually nginx will self-heal the cache zone.  It probably
> should be logged at [error] or even [warn] level instead.
>

Why would max_size not be maintained properly? Isn't that the 
responsibility cache manager process? Are there known issues/bugs?

Thank you for your response and assistance.

-- 
Jim Ohlstein