Open-sourcing periodic upstream server resolution and implementing a dedicated service worker.

Wed Jul 20 20:36:03 UTC 2022

Hello!

On Wed, Jul 20, 2022 at 07:08:40PM +0000, Vladimir Kokshenev via nginx-devel wrote:

> Hello!
> 
> This is the two-part proposal to open-source periodic upstream server resolution
> and implement a dedicated service worker for nginx. The purpose of this e-mail
> is to describe the WHY and solicit feedback.
> 
> Nginx supports domain names in the upstream server configuration.
> Currently, domain names are resolved at configuration time only,
> and there are no subsequent name resolutions.
> 
> There are plans to open-source re-resolvable upstream servers.
> This will allow applying DNS updates to upstream configurations in runtime.
> So, there is a need to support periodic asynchronous operations.
> And a dedicated service worker is a possible architectural way to address this.
> 
> The master process reads and parses configuration and creates the service worker
> when needed (in a similar way to cache-related processes).
> 
> The service worker manages periodic name resolutions and updates corresponding
> upstream configurations. The name resolution relies on the existing nginx
> resolver and upstream zone functionality.
> 
> The service worker will be responsible solely for periodic background tasks
> and wouldn't accept client connections.
> 
> The service worker should be the last worker process to shut down
> to maintain the actual state of upstreams when there are active workers.
> 
> Alternative architecture considered was about choosing one of the regular 
> workers (e.g., worker zero) to take care of periodic upstream server resolution,
> but it creates asymmetry in responsibilities and load for this dedicated worker.

Both alternatives look bad to me.

We already have dedicated processes to load and manage caches, and 
I tend to think that the only thing which somehow justifies these 
being dedicated is that cache management implies disk-intensive 
blocking operations.  Mixing these with normal request processing 
will cause latency issues, not to mention will be non-trivial to 
implement.  On the other hand, we are already seeing issues with 
dedicated process being used: in some configurations just one 
cache manager process simply isn't enough to remove all the files 
being add by many worker processes.

As such, I generally tend to think that dedicated processes is a 
wrong way to go.  With any special requirements like "last to shut 
down" the whole idea becomes even worse.  And in this particular 
case all operations are perfectly asynchronous, so there is no 
justification like in the cache manager case.

Similarly, "choosing one of the regular workers (e.g., worker 
zero)" looks wrong (and I've already provided this feedback 
previously).  All workers are expected to be equal, and doing 
something only in a particular worker is expected to cause issues.  
E.g., consider a worker is stopped (due to a bug or intentionally 
to debug an issue) - this shouldn't disrupt operations of other 
workers.

Rather, I would recommend focusing on doing all periodic tasks in 
a way which doesn't depend on being run in the particular worker.  
The simplest approach would be to run tasks in all worker 
processes, with some minimal checks to avoid duplicate work.

-- 
Maxim Dounin
http://mdounin.ru/