[PATCH 5 of 6] Upstream: allow any worker to resolve upstream servers
Aleksei Bavshin
a.bavshin at nginx.com
Thu Feb 9 16:45:11 UTC 2023
On 2/5/2023 7:01 PM, J Carter wrote:
> Hi Aleksei,
>
> Why not permanently assign the task of resolving a given upstream server
> group (all servers/peers within it) to a single worker?
>
> It seems that this approach would resolve the SRV issues, and remove the
> need for the shared queue of tasks.
>
> The load would still be spread evenly for the most realistic scenarios -
> which is where there are many upstream server groups of few servers, as
> opposed to few upstream server groups of many servers.
The intent of the change was exactly opposite, to avoid any permanent
assignment of periodic tasks to a worker and allow another processes to
resume resolving if the original assignee exits, no matter if normally
or abnormally. I'm not even doing enough for that -- I should've kept
in-progress tasks at the end of the queue with expires = resolver
timeout + a small constant, and retry from another process when the
timeout is reached, but the idea was abandoned for a minuscule
improvement of insertion time. I expect to be asked to reconsider, as
patch 6/6 does not cover all the possible situations where we want to
recover a stale task.
A permanent assignment of a whole upstream would also require notifying
another processes that the upstream is no longer assigned if the worker
exits or consistently recovering that assignment over a restart of
single worker (e.g. after a crash - not a regular situation, but one we
should take into account nonetheless). And the benefit is not quite
obvious - I mentioned that resolving SRVs with a lot of records may take
longer to update the list of peers, but the situation with contention is
not expected to change significantly* if we pin these tasks to a single
worker as another worker may be doing the same for another upstream.
Most importantly, this isn't even a bottleneck. It only slightly
exacerbates an existing problem with certain balancers that already
suffer from the overuse of locks, in a configuration that was
specifically crafted to amplify and highlight the difference and is far
from these most realistic scenarios.
* Pending verification on a performance test stand.
More information about the nginx-devel
mailing list