Weird timeouts, not sure if I've set the right threshholds

Rob Mueller robm at fastmail.fm
Sat May 3 05:14:14 MSD 2008


> Can anyone explain the prejudice against NFS?

NFS *can* cause blocking problems.

I haven't done a detailed analysis of how nginx is serving files, but I've 
seen NFS flakiness cause massive problems because lots of processes trying 
to access the NFS share end up "locking up" waiting on the NFS server to 
respond. Because the lock occurs inside the kernel (eg when the process is 
doing a read() call, or accessing an mmaped() region of a file), the 
processes are almost completely uninterruptable.

With nginx, this would be even worse. nginx uses a small process count + 
non-blocking event loop model for serving files. If something is causes that 
loop to "block" (eg. waiting on the NFS server), nginx will basically freeze 
up and stop serving files completely. In theory, using epoll() and 
sendfile() should push that blocking down into the kernel which shouldn't 
affect nginx, but as I said, I haven't done a detailed analysis. Even doing 
things like stat() which can't be made non-blocking on an NFS mounted file 
can block badly causing an entire nginx process to freeze.

Some things then to check.

How does nginx handle file IO?

If you're not using sendfile(), does nginx use read() with O_NONBLOCK? Does 
the linux kernel block a read() call on an NFS file if the NFS server is 
having problems even if you're using O_NONBLOCK?

If you're using sendfile() on an NFS file, does the linux kernel block the 
call if there's a problem accessing the NFS server?

Does nginx use stat() calls to verify a file/path exists? Does the linux 
kernel block a stat() call on an NFS file if the NFS server is having 
problems?

mike: Can we get an strace (run something like "strace -T -p nginxpid -o 
/tmp/st.out") of one of your nginx processes for about 30 seconds (NOT the 
parent process, I want to see what one of the child serving processes is 
doing) while you're having problems. That should show if any system calls 
are taking a long time and causing problems.

Rob






More information about the nginx mailing list