[PATCH] Allow binary upgrades in Solaris zones

Maxim Dounin mdounin at mdounin.ru
Thu Jan 6 02:53:36 MSK 2011


Hello!

On Wed, Jan 05, 2011 at 07:22:08PM +0000, doug at hcsw.org wrote:

> Hello nginx-devel,
> 
> Thank you very much for nginx.
> 
> When running nginx in a Solaris zone, I am unable to do a binary upgrade without
> fully stopping and starting nginx. When I send the master process a USR2 signal,
> it refuses to do the upgrade and writes the following log message:
> 
> 2011/01/04 16:00:23 [crit] 3818#0: the changing binary signal is ignored: you should shutdown or terminate before either old or new binary's process
> 
> After looking at the code, it seems that nginx assumes if the master process's parent
> does not have PID == 1, then nginx is not running in stand-alone daemon mode and the
> upgrade should not be attempted.
> 
> My problem is that in Solaris zones the master process's parent is actually the
> zsched process and this never has PID == 1. The real init process is not visible
> inside the zone at all.

Yes, nginx checks if previous upgrade was finished by checking 
parent pid to be 1.  This behaviour is indeed not portable, as 
POSIX[1] says:

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/_Exit.html

% The parent process ID of all of the existing child processes and 
% zombie processes of the calling process shall be set to the 
% process ID of an implementation-defined system process. That is, 
% these processes shall be inherited by a special system process.

...

% Historically, the implementation-defined process that inherits 
% children whose parents have terminated without waiting on them is 
% called init and has a process ID of 1.

So basically nginx relies on historical behaviour.

> I am attaching a patch against 0.9.3 that (only if NGX_SOLARIS is defined) checks to
> see if a root process can send a signal to init and, if not, assumes we are running
> in a zone and goes ahead with the binary upgrade. With this patch I am able to do
> 0-downtime binary upgrades in Solaris zones with no problems. Any other solutions
> would also be appreciated.

I don't really like this aproach, it looks fragile and actually 
adds another non-portable hack instead of fixing original 
non-portability.

Probably passing real parent pid from old binary and checking if 
getppid() [doesn't] match whould be better aproach (at least, it 
should be portable).

Maxim Dounin



More information about the nginx-devel mailing list