Weird timeouts, not sure if I've set the right threshholds

Sat May 3 07:17:20 MSD 2008

On 5/2/08, Rob Mueller <robm at fastmail.fm> wrote:

> mike: Can we get an strace (run something like "strace -T -p nginxpid -o
> /tmp/st.out") of one of your nginx processes for about 30 seconds (NOT the
> parent process, I want to see what one of the child serving processes is
> doing) while you're having problems. That should show if any system calls
> are taking a long time and causing problems.

It's going to be difficult to try to get this stuff going on my
production cluster. My largest client is already cutting my profit
margins down to a loss due to downtime related to problems, much less
testing.

I am -not- happy about NFS but at the moment I don't see any options
for removing it without changing all my clients to use some other
application-aware solution like MogileFS, etc. I've looked at AFS/Coda
and some other things. NFS seems to be the most widespread and widely
used solution...

I've tried ISCSI+OCFS2. I've tried NFS v4, v3, tcp, udp, jumbo frames,
32k rsize/wsize, 8k rsize/wsize, all that stuff. I already have plans
to at least get the largest client (using the largest % of resources
by far) to move to a MogileFS solution (for data) and just have local
webserver copies of scripts/templates (and use rsync + a "build
script")

I was going to at least try FreeBSD for the clients as well. Then at
least for now, I'd hopefully have a stable-r environment. I mean, I'm
not pushing that much data. People run NFS with thousands of clients
pushing lots of traffic. I think I'm pushing maybe 3MB/sec total
during peak time over 3 heavy and 2 light clients.

Would there be a way to setup nginx to dump some extended debugging
during issue time (and not normally or I'd have logfiles too large to
look through)