Upgrading Executable on the Fly - wrong docs?

Tue Feb 19 11:54:53 UTC 2013

On Tue, Feb 12, 2013 at 03:01:39PM -0500, piotr.dobrogost wrote:
> Ruslan, thanks for quick reply.
> 
> I have some trouble comparing the new wording with the previous one as it
> looks like your change went live at http://nginx.org/en/docs/control.html so
> I do not have the old one to compare any more :)

Already answered.

> Neverthless I have some more comments on the new (current) one.
> 
> I think an error sneaked into the new version. The first bullet is now
> "Send the HUP signal to the old master process. The old master process will
> start new worker processes without re-reading the configuration. After that,
> all new processes can be shut down gracefully, by sending the QUIT signal to
> the old master process."
> I think it should have been "(...) by sending the QUIT signal to the new
> master process." instead.

Thanks for spotting this, the fixed version is already on site.

> What I don't understand is why the old master process does not re-read the
> configuration after receiving the HUP signal as at the top of the page it's
> written
> HUP (...), starting new worker processes with a new configuration, (...)
> If the reason is because it had received the USR2 signal at the beginning of
> the whole procedure and this changed its state (it "remembers" receiving the
> USR2 signal) it should be explained.

HUP after USR2 is handled differently, exactly as documented.
When master process knows it's "old" (i.e., upgrade procedure
is in progress), a request to start new worker processes is
interpreted as a rollback request -- master starts new worker
and cache manager processes with an old configuration.

> Also, maybe I'm missing something but I think that the two bullets are not
> symmetrical without a reason. In the first bullet the QUIT signal is used
> whereas in the second bullet the TERM signal is used. I believe either of
> them could be used with the obvious difference of fast vs graceful shutdown.
> If it's true (either could be used) then using different signals between the
> first and the second bullet is misleading.

These are two different procedudes with different properties.

In the first case, you restart old workers with an old configuration,
but let requests that are currently in-fly to be fully processed
(if you can tolerate this).  There's no interruption in handling
requests.

In the second case, you want to stop new workers right away (e.g.,
something really odd happened that you can't tolerate even in-fly
requests to finish), and it requires only a single action from
you to roll back (or none at all if e.g. a new binary process
segfaults).  But there's a small window where connection attempts
may be rejected.

Of course one may picture down other procedures, like starting old
workers and immediately stopping new processes, but how this is
practically different from the first case?  Or one can gracefully
stop new workers (new requests will be rejected, but those in-fly
will be serviced, potentially indefinitely), and only after that
old workers will be restarted and new requests will be handled
(sorry, but such a procedure doesn't make any sense to me).

> Additionaly I have a question regarding the following fragment:
> "In order to upgrade the server executable, the new executable file should
> be put in place of an old file first. After that USR2 signal should be sent
> to the master process. The master process first renames its file (...)
> How can the master process rename its file if this file is already gone i.e.
> it had been replaced by the new executable?

Read further, it "renames its file with the process ID", see
http://nginx.org/r/pid