Socket connection failures on 1.6.1~precise
jon.clayton at rackspace.com
Wed Sep 10 03:15:20 UTC 2014
Just closing the loop on this, but what appeared to be happening was
that newly created nodes were not having the nginx master PID start up
with a custom ulimit set in /etc/security/limits.d/. The workers were
all fine since the worker_rlimit_nofile was set in the nginx.conf, but I
was running into a separate issue that was preventing nginx from
inheriting the custom ulimit setting for that master PID file.
Truth be told, I never quite nailed down an exact RCA other than
ensuring the nginx master PID came up with the custom ulimit setting.
That would seem to indicate something was causing a spike in the number
of open files for the master PID, but I can look into that separately.
On 09/02/2014 03:35 PM, Jon Clayton wrote:
> I did see the changelog hadn't noted many changes and running a diff
> of the versions shows what you mentioned regarding the 400 bad request
> handling code. I'm not necessarily stating that nginx is the problem,
> but it would seem like something had changed enough to cause the
> backend's backlog to fill more rapidly.
> That could be a completely bogus statement as I've been attempting to
> find a way to track down exactly what backlog is being filled, but my
> test of downgrading nginx back to 1.6.0 from the nginx ppa seemed to
> also point at a change in nginx causing the issue since the errors did
> not persist after downgrading.
> It's very possible that I'm barking up the wrong tree, but the fact
> that only changing nginx versions back down to 1.6.0 from 1.6.1
> eliminated the errors seems suspicious. I'll keep digging, but I'm
> open to any other suggestions.
> On 09/02/2014 02:14 PM, Maxim Dounin wrote:
>> On Tue, Sep 02, 2014 at 11:00:10AM -0500, Jon Clayton wrote:
>>> I'm trying to track down an issue that is being presented only when
>>> I run
>>> nginx version 1.6.1-1~precise. My nodes running 1.6.0-1~precise do not
>>> display this issue, but freshly created servers are getting floods
>>> of these
>>> socket connection issues a couple times a day.
>>> /connect() to unix:/tmp/unicorn.sock failed (11: Resource temporarily
>>> unavailable) while connecting to upstream/
>>> The setup I'm working with is nginx proxying requests to a unicorn
>>> powered by a ruby app. As stated above, the error is NOT present on
>>> running 1.6.0-1~precise, but any newly created node gets the newer
>>> 1.6.1-1~precise package installed and will inevitably have that error.
>>> All settings from nodes running 1.6.0 appear to be the same as newly
>>> nodes on 1.6.1 in terms of sysctl settings, nginx settings, and unicorn
>>> settings. All package versions are the same except for nginx. When I
>>> downgraded one of the newly created nodes to nginx 1.6.0 using the
>>> nginx ppa
>>> https://launchpad.net/~nginx/+archive/ubuntu/stable), the error was not
>>> Is there any advice, direction, or similar issue experienced that
>>> else might be able to help me track this down?
>> Just some information:
>> - In nginx itself, the difference between 1.6.0 and 1.6.1 is fairy
>> minimal. The only change affecting http is one code line added
>> in the 400 Bad Request handling code
>> (see http://hg.nginx.org/nginx/rev/b8188afb3bbb).
>> - The message suggests that backend's backlog is full. This can
>> easily happen on load spikes and/or if a backend is overloaded,
>> and usually unrelated to the nginx itself.
> nginx mailing list
> nginx at nginx.org
More information about the nginx