[PATCH 0 of 6] Upstream: re-resolvable servers.
a.bavshin at nginx.com
Wed Feb 1 01:36:55 UTC 2023
The series is a compilation of patches with the upstream re-resolve
feature from the Nginx Plus. The original commits were rebased on top
of the current OSS code, grouped by features introduced and squashed.
Some formatting quirks and other minor oddities could be attributed to a
conscious effort to reduce divergence with the source branch.
The last couple of patches in the series is a new code that allows
sharing name resolution tasks between all the workers.
Known issues and TODOs:
- The whole series is known to be broken on win32 with multiple worker
processes, as it relies on the ngx_worker value to keep track of the
locality of data. Initializing ngx_worker to a correct value should
'noreuse' zones also seem to be unsupported on this platform, so
configuration reload may fail.
- The functionality requires shared zone of a sufficient size to be
configured in the upstream block. A rough estimation is 2k for a
configured server entry + 2k for each resolved address.
The zone requirement could be lifted with local allocation of the
resolved peer data, but implementing that was out of scope.
- Resolved peer addresses are not carried over to a new generation of
workers during configuration reload (see below).
- Tests still require some cleanup and will be published later.
Peer list population delay
In the cases of a cold start, a reload or a binary upgrade, the
upstreams that contain only resolvable servers will have an empty list
of peers. This leads to a short delay before Nginx is able to send the
traffic to upstream. There's no perfect solution for that: if the
server list in the configuration has changed, it's no longer compatible
with the data we collected for a previous config. If the resolver
parameters were modified, we may get an entirely different set of
The following options were considered:
- Publishing the preresolve code from the Nginx Plus as is.
The solution involves copying peer states from the non-reusable zone
of a previous generation of workers. This only addresses the reload
case and may result in a stale peer data if the configuration
The advantage of this code is that it is heavily tested and has been
running in multiple production environments for many years.
- Sharing the zone between all generations of workers.
This requires some changes in the code, notably improving reference
counting and cleanup for peer data in the shared zone (as we're no
longer able to discard the old zone with all the allocated data) and
tracking the upstream configuration compatibility. It also doesn't
work when the zone size has changed in the config.
The approach leads to increased memory requirements: zone size should
be configured to accomodate multiple generations of workers, and we
are aware of deployments that have lots of those due to long-living
connections. Nginx OSS does not offer any means to monitor shared
memory usage at the moment, so I fear this approach will hurt a lot of
unsuspecting users who haven't reserved enough memory.
There are also performance concerns, as access to the same list of
peers from multiple generations of workers would increase lock
contention (and the situation is already not looking well with
round-robin lb). We can copy the peers instead of attempting to
reuse, but that prevents us from optimizing the memory usage.
- Queueing the requests until we finish the initial cycle of name
resolution ('queue' directive of the ngx_http_upstream_module).
This option adds a latency spike at the moment of configuration
reload. There's also an issue with propagation of the upstream
readiness state to all the worker processes - we need an event
passing channel to be able to resume queued requests immediately.
On the positive side, this would mitigate downtime for all 3
scenarios, as long as the queue capacity is sufficient.
Given the latency spike, it doesn't seem to be a good standalone
solution. But it might be a nice addition to one of the options
Alternatives like pre-resolving servers during configuration load were
not considered due to complexity and significant disadvantages.
Maxim, from the list archives I understand that you had a negative
opinion on the current approach with noreuse zones and pre-resolve,
but I'm afraid there wasn't enough context to understand all the sides
of that discussion. I'd appreciate if you share your thoughts on the
problem and on the approach you consider architecturally correct.
More information about the nginx-devel