[PATCH 4 of 4] AIO operations now add timers (ticket #2162)
Maxim Dounin
mdounin at mdounin.ru
Tue Jan 9 05:59:14 UTC 2024
Hello!
On Mon, Jan 08, 2024 at 01:31:11PM +0000, J Carter wrote:
> On Mon, 8 Jan 2024 11:25:55 +0000
> J Carter <jordanc.carter at outlook.com> wrote:
>
> > Hello,
> >
> > On Mon, 27 Nov 2023 05:50:27 +0300
> > Maxim Dounin <mdounin at mdounin.ru> wrote:
> >
> > > # HG changeset patch
> > > # User Maxim Dounin <mdounin at mdounin.ru>
> > > # Date 1701050170 -10800
> > > # Mon Nov 27 04:56:10 2023 +0300
> > > # Node ID 00c3e7333145ddb5ea0eeaaa66b3d9c26973c9c2
> > > # Parent 61d08e4cf97cc073200ec32fc6ada9a2d48ffe51
> > > AIO operations now add timers (ticket #2162).
> > >
> > > Each AIO (thread IO) operation being run is now accompanied with 1-minute
> > > timer. This timer prevents unexpected shutdown of the worker process while
> > > an AIO operation is running, and logs an alert if the operation is running
> > > for too long.
> >
> > Shouldn't this timer's duration be set to match worker_shutdown_timeout's
> > duration rather than being hard coded to 60s ?
>
> Ah nevermind, I understand.
>
> These timers will either expire from passing the 60s set duration, or
> will expire as worker_process_timeout itself expires, kills the
> connection and times out associated timers (including the aio timers).
>
> Setting it to worker_shutdown_timeout's duration would be pointless
> (an 'infinite' timer would give the same result).
>
> So the only situation in which a different value for these AIO
> timers would make sense is if these AIO operations are expected to
> take longer 60s, but less than worker_shutdown_timeout (in cases
> where it has been increased from it's own default of 60s).
>
> In that case the AIO operation's timeout would have to be one
> (or more) of it's own directives, with a value less than
> worker_shutdown_timeout.
Not really.
When worker_shutdown_timeout expires, it tries to terminate the
request, but it can't as long as an AIO operation is running.
When the AIO operation completes, the request will be actually
terminated and the worker process will be allowed to exit. So far
so good.
But if the AIO operation never completes, the timer will expire
after 1 minute, will log an alert, and the worker processes will
be anyway allowed to exit (with the request still present). This
might not be actually possible though - for example,
ngx_thread_pool_exit_worker() will just block waiting for all
pending operations to complete.
In theory, the situation when an AIO operation never completes
should never happen, and just a counter of pending AIO
operations can be used instead to delay shutdown (which is
essentially equivalent to an infinite timer).
In practice, though, using a large enough guard timer might be
better: it provides additional level of protection against bugs or
system misbehaviour, and at least logs an alert if something
really weird happens. It is also looks more universal and in line
with current approach of using existing timers as an indicator
that something is going on and shutdown should be delayed.
The timer is expected to be "large enough", since we cannot do
anything meaningful with an AIO operation which never completes,
we can only complain loudly, so the timer should never expire
unless there is something really wrong. This is not the case with
worker_shutdown_timeout: it can be set to an arbitrary low value,
which is not expected to mean that something is really wrong if
the timer expires, but rather means that nginx shouldn't try to
process remaining requests, but should instead close them as long
as it can do so. That is, worker_shutdown_timeout does not fit
semantically. Further, worker_shutdown_timeout is not set by
default, so it simply cannot be used.
The 1 minute was chosen as it matches default send_timeout, which
typically accompanies AIO operations when sending responses (and
also delays shutdown, so no "open socket left" alerts are normally
seen). Still, send_timeout is also quite different semantically,
and therefore a hardcoded value is used instead.
I don't think there are valid cases when AIO operations can take
longer than 1 minute, as these are considered to be small and fast
operations, which are normally done synchronously within nginx
event loop when not using AIO, and an operations which takes 1m
would mean nginx is completely unresponsive. Still, if we'll
found out that 1 minute is not large enough in some cases,
we can just bump it to a larger value.
[...]
--
Maxim Dounin
http://mdounin.ru/
More information about the nginx-devel
mailing list