[PATCH 4 of 4] AIO operations now add timers (ticket #2162)
J Carter
jordanc.carter at outlook.com
Tue Jan 9 15:01:31 UTC 2024
Hello,
On Tue, 9 Jan 2024 08:59:14 +0300
Maxim Dounin <mdounin at mdounin.ru> wrote:
> Hello!
>
> On Mon, Jan 08, 2024 at 01:31:11PM +0000, J Carter wrote:
>
> > On Mon, 8 Jan 2024 11:25:55 +0000
> > J Carter <jordanc.carter at outlook.com> wrote:
> >
> > > Hello,
> > >
> > > On Mon, 27 Nov 2023 05:50:27 +0300
> > > Maxim Dounin <mdounin at mdounin.ru> wrote:
> > >
> > > > # HG changeset patch
> > > > # User Maxim Dounin <mdounin at mdounin.ru>
> > > > # Date 1701050170 -10800
> > > > # Mon Nov 27 04:56:10 2023 +0300
> > > > # Node ID 00c3e7333145ddb5ea0eeaaa66b3d9c26973c9c2
> > > > # Parent 61d08e4cf97cc073200ec32fc6ada9a2d48ffe51
> > > > AIO operations now add timers (ticket #2162).
> > > >
> > > > Each AIO (thread IO) operation being run is now accompanied with 1-minute
> > > > timer. This timer prevents unexpected shutdown of the worker process while
> > > > an AIO operation is running, and logs an alert if the operation is running
> > > > for too long.
> > >
> > > Shouldn't this timer's duration be set to match worker_shutdown_timeout's
> > > duration rather than being hard coded to 60s ?
> >
> > Ah nevermind, I understand.
> >
> > These timers will either expire from passing the 60s set duration, or
> > will expire as worker_process_timeout itself expires, kills the
> > connection and times out associated timers (including the aio timers).
> >
> > Setting it to worker_shutdown_timeout's duration would be pointless
> > (an 'infinite' timer would give the same result).
> >
> > So the only situation in which a different value for these AIO
> > timers would make sense is if these AIO operations are expected to
> > take longer 60s, but less than worker_shutdown_timeout (in cases
> > where it has been increased from it's own default of 60s).
> >
> > In that case the AIO operation's timeout would have to be one
> > (or more) of it's own directives, with a value less than
> > worker_shutdown_timeout.
>
> Not really.
>
> When worker_shutdown_timeout expires, it tries to terminate the
> request, but it can't as long as an AIO operation is running.
> When the AIO operation completes, the request will be actually
> terminated and the worker process will be allowed to exit. So far
> so good.
>
> But if the AIO operation never completes, the timer will expire
> after 1 minute, will log an alert, and the worker processes will
> be anyway allowed to exit (with the request still present). This
> might not be actually possible though - for example,
> ngx_thread_pool_exit_worker() will just block waiting for all
> pending operations to complete.
>
> In theory, the situation when an AIO operation never completes
> should never happen, and just a counter of pending AIO
> operations can be used instead to delay shutdown (which is
> essentially equivalent to an infinite timer).
>
> In practice, though, using a large enough guard timer might be
> better: it provides additional level of protection against bugs or
> system misbehaviour, and at least logs an alert if something
> really weird happens. It is also looks more universal and in line
> with current approach of using existing timers as an indicator
> that something is going on and shutdown should be delayed.
>
> The timer is expected to be "large enough", since we cannot do
> anything meaningful with an AIO operation which never completes,
> we can only complain loudly, so the timer should never expire
> unless there is something really wrong. This is not the case with
> worker_shutdown_timeout: it can be set to an arbitrary low value,
> which is not expected to mean that something is really wrong if
> the timer expires, but rather means that nginx shouldn't try to
> process remaining requests, but should instead close them as long
> as it can do so. That is, worker_shutdown_timeout does not fit
> semantically. Further, worker_shutdown_timeout is not set by
> default, so it simply cannot be used.
>
Good point, for whatever reason I had it in my mind that was set
set by default (and you're right it doesn't fit in any case).
> The 1 minute was chosen as it matches default send_timeout, which
> typically accompanies AIO operations when sending responses (and
> also delays shutdown, so no "open socket left" alerts are normally
> seen). Still, send_timeout is also quite different semantically,
> and therefore a hardcoded value is used instead.
>
> I don't think there are valid cases when AIO operations can take
> longer than 1 minute, as these are considered to be small and fast
> operations, which are normally done synchronously within nginx
> event loop when not using AIO, and an operations which takes 1m
> would mean nginx is completely unresponsive. Still, if we'll
> found out that 1 minute is not large enough in some cases,
> we can just bump it to a larger value.
>
> [...]
>
Thanks for the detailed explanation - makes sense to me.
More information about the nginx-devel
mailing list