Hi,<br><br><div class="gmail_quote">On Sat, Jan 14, 2012 at 2:16 AM, Maxim Dounin <span dir="ltr"><<a href="mailto:mdounin@mdounin.ru" target="_blank">mdounin@mdounin.ru</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">


Hello!<br>

<div><br>

On Thu, Jan 12, 2012 at 11:37:00PM +0800, Joshua Zhu wrote:<br>

<br>

> Hi,<br>

><br>

> 'cpu_set_t' should be used instead of 'long' when calling<br>

> sched_setaffinity(2). Otherwise it may fail on some Linux systems.<br>

> Please see the attachment for more detail.<br>

<br>

</div>According to sched_setaffinity(2) at least in kernels >= 2.6.9 it<br>

should be ok to use any length as long as bits above kernel mask<br>

used aren't referenced.<br>

<br></blockquote><div><br>I've tested it on Linux 2.6.9 (Red Hat Enterprise Linux AS release 4) and unfortunately sched_setaffinity(2) failed.<br> </div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">


I believe the real problem is that code uses 32 as cpusetsize,<br>

while this is the size in bytes.  Hence it references arbitrary<br>

memory and this causes EINVAL to be returned by glibc if there are<br>

bits referenced above kernel cpumask size, see<br>

sysdeps/unix/sysv/linux/sched_setaffinity.c in glibc's sources.<br>

<br>

The following patch should be enough:<br>

<br>

diff -r e1296af53cc0 src/os/unix/ngx_process_cycle.c<br>

--- a/src/os/unix/ngx_process_cycle.c   Mon Dec 26 00:00:00 2011 +0400<br>

+++ b/src/os/unix/ngx_process_cycle.c   Fri Jan 13 11:58:01 2012 -0500<br>

@@ -914,7 +914,10 @@<br>

         ngx_log_error(NGX_LOG_NOTICE, cycle->log, 0,<br>

                       "sched_setaffinity(0x%08Xl)", cpu_affinity);<br>

<br>

-        if (sched_setaffinity(0, 32, (cpu_set_t *) &cpu_affinity) == -1) {<br>

+        if (sched_setaffinity(0, sizeof(cpu_affinity),<br>

+                              (cpu_set_t *) &cpu_affinity)<br>

+            == -1)<br>

+        {<br>

             ngx_log_error(NGX_LOG_ALERT, cycle->log, ngx_errno,<br>

                           "sched_setaffinity(0x%08Xl) failed", cpu_affinity);<br>

         }<br>

<br>

Could you please test if it works for you?<br></blockquote><div><br>Yes, it works on one of my Linux box which has two cores. But I have not tested it on more other systems yet.<br> </div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">


<br>

It passes my limited testing, but I wasn't really able to<br>

reproduce the problem without modifying the sched_setaffinity()<br>

call in question to use bigger cpusetsize.<br>

<br>

[...]<br>

<br>

> diff -uprN nginx-1.1.12/src/core/nginx.c nginx-1.1.12-bugfix/src/core/nginx.c<br>

> --- nginx-1.1.12/src/core/nginx.c     2011-11-16 04:35:42.000000000 +0800<br>

> +++ nginx-1.1.12-bugfix/src/core/nginx.c      <a href="tel:2012-01-12%2011" value="+12012011211" target="_blank">2012-01-12 11</a>:06:00.000000000 +0800<br>

> @@ -1245,15 +1245,15 @@ ngx_set_cpu_affinity(ngx_conf_t *cf, ngx<br>

>      ngx_core_conf_t  *ccf = conf;<br>

><br>

>      u_char            ch;<br>

> -    u_long           *mask;<br>

> +    cpu_set_t        *mask;<br>

>      ngx_str_t        *value;<br>

> -    ngx_uint_t        i, n;<br>

> +    ngx_uint_t        i, j, n;<br>

><br>

>      if (ccf->cpu_affinity) {<br>

>          return "is duplicate";<br>

>      }<br>

><br>

> -    mask = ngx_palloc(cf->pool, (cf->args->nelts - 1) * sizeof(long));<br>

> +    mask = ngx_palloc(cf->pool, (cf->args->nelts - 1) * sizeof(cpu_set_t));<br>

>      if (mask == NULL) {<br>

>          return NGX_CONF_ERROR;<br>

>      }<br>

<br>

[...]<br>

<br>

While I generally agree that using cpu_set_t and CPU_* macros<br>

explicitly would be better, I don't think we want to introduce<br>

linux-centric code here in platform-independent code.<br>

<br></blockquote><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

Instead we probably want to leave this code generic enough to be<br>

useable with other scheduler interfaces, and convert the data<br>

somewhere near platform-specific whatever_setaffinity() calls if<br>

needed.  (Likely unneeded for the Linux with the above patch.)<br>

<br></blockquote></div><br>We use cpu_set_t and CPU_* macros in the patch not only because it's 

more generic and not depend on the implementation of the kernel (I guess

 the code in Nginx now is inspired by some _early_ kernel implementation, i.e. Linux 2.5.x), but also we will have no 32-core-limitation any more ('"worker_cpu_affinity" supports up to 32 CPU only').<br>

<br>

I agree that the set_cpu_affinity related code in Nginx core should be platform-independent. But the code now is _not_ (please look at those NGX_HAVE_SCHED_SETAFFINITY macros). And I do recommend move the implementation of ngx_set_cpu_affinity() to the os/ directory when worker_cpu_affinity is supported on other platforms.<br>


<br>Thank you very much for the review :)<br><br>Regards,<br clear="all"><br>-- <br>Joshua Zhu<br>Senior Software Engineer<br>Server Platforms Team at Taobao<br>