Weird timeouts, not sure if I've set the right threshholds

Sat May 3 13:04:55 MSD 2008

On Sat, May 03, 2008 at 03:14:05AM -0400, Denis S. Filimonov wrote:

> On Saturday 03 May 2008 02:04:37 Igor Sysoev wrote:
> > On Fri, May 02, 2008 at 04:44:21PM -0400, Denis S. Filimonov wrote:
> > > Can anyone explain the prejudice against NFS?
> > >
> > > Specifically, why would additional proxy hop be faster than serving files
> > > from NFS?
> > > I can see two points in favor of NFS:
> > > - NFS client caches files while Nginx doesn't (yet)
> > > - Nginx doesn't support keepalive connections to upstream, hence
> > > additional latencies and traffic for TCP handshake/finalization. NFS
> > > doesn't have this issue since it typically works over UDP.
> > >
> > > I do have a couple boxes serving a lot of traffic (mostly PHP) from NFS.
> > > It works just fine, though it did take some NFS tuning.
> >
> > All filesystems read operations are blocking operations, i.e. if file page
> > is not in VM cache, a process must wait for it. The only exception are
> > aio_read(), but it has its own drawbacks. Local filesystem with non-faulty
> > disks has constant blocking time: about 10-20ms, seek time. NFS may block
> > longer.
> >
> That's only true under the assumption of empty IO queue.
> 
> At any rate, assuming the NFS server has the same disk latency, it adds the 
> network latency on top of that (CPU time is negligible compared to disk 
> seek). Roundtrip on a lightly loaded properly configured network takes under 
> 1ms, i.e. an order of magnitude lower than a disk seek.

I agree, but any packet loss, retransmission, etc. will affect whole worker.

Also, as I understand, in modern Linux it's not easy to find the cause
of stall: in FreeBSD in top/ps you will see that a process waits on
"nfsrcv" WCHAN or so. Probably, modern NFS in FreeBSD became much better
(I saw many Yahoo!, Isilon, etc. developers NFS commits), but in one old
setup (FreeBSD 4 as client and FreeBSD 3 as server), I saw many Apache
stalls after some NFS things go wrong.

> > And blocked nginx worker can not handle other its connections, those
> > can be handled fast from VM cache/etc. You do not see it in PHP case,
> > because each PHP process handles the single connection at the same time.
> 
> That's true, however one can increase the number of workers by the amount of 
> latency increase to archive the same level of concurrency, in this case it's 
> only 10%. That's really not a problem.
> 
> The problem with NFS happens when all necessary data blocks are cached: a 
> local FS would just happily return the cached data without accessing the disk 
> while NFS client still issues a request to see if the file has changed. Thus, 
> NFS tends to flood network with tiny requests and that's the cause of its 
> slowness. My point is that in most cases it can be easily prevented by 
> relaxing cache coherency protocol without sacrifying safety.
> 
> -- 
> Denis.

-- 
Igor Sysoev
http://sysoev.ru/en/