[Patch] SO_REUSEPORT support from master process

Lu, Yingqi yingqi.lu at intel.com
Tue Sep 23 15:35:47 UTC 2014


I am wondering if there are any more feedback/comments/concerns?

Also, does the format look OK to all of you? 

Thanks,
Yingqi

-----Original Message-----
From: nginx-devel-bounces at nginx.org [mailto:nginx-devel-bounces at nginx.org] On Behalf Of Lu, Yingqi
Sent: Saturday, September 20, 2014 4:45 PM
To: nginx-devel at nginx.org
Subject: RE: [Patch] SO_REUSEPORT support from master process

Dear All,

I just sent the new patch directly from ToriseHg. Thanks again very much to Valentin Bartenev for his valuable feedback! The 2 changes from last version are:

1. We use "total CPU threads" instead of "active CPU threads" to calculate the number of duplicated listeners. 
2. We modify some of the code to meet the style requirements listed at http://nginx.org/en/docs/contributing_changes.html.

However, it is automatically named as "[PATCH] SO_REUSEPORT support from master process" instead of "[Patch] SO_REUSEPORT support from master process". Therefore, it does not show in this thread. The patch is available at a separate thread at http://forum.nginx.org/read.php?29,253446.

Can anyone help to move the patch to the main thread and delete the duplicated copies?

By the way, if you use outlook and you see a small message on the top saying "We removed extra line breaks from this message", please click on it to "Restore line breaks". Otherwise, the code shown there is not complete. The patch is tested locally, but I am concerned about the format changes during the email delivery. Please let me know if you see wrong format or cannot apply the patch.

Thank you very much for your help!
Yingqi

-----Original Message-----
From: Lu, Yingqi 
Sent: Saturday, September 20, 2014 4:00 PM
To: nginx-devel at nginx.org
Subject: RE: [Patch] SO_REUSEPORT support from master process

Dear All,

After I sent out the patch, I noticed that the patch inserted in the email is actually with different format than the original one generated by TortoiseHg. What I did is just copying it and pasting in the email. Did I missed anything? I guess this is not the correct way to send the patch.

I will try to find a way to send directly from ToriseHg.

Thanks,
Yingqi

-----Original Message-----
From: nginx-devel-bounces at nginx.org [mailto:nginx-devel-bounces at nginx.org] On Behalf Of Lu, Yingqi
Sent: Saturday, September 20, 2014 3:23 PM
To: nginx-devel at nginx.org
Subject: RE: [Patch] SO_REUSEPORT support from master process

Dear All,

Here is another updated version of the patch. Thanks very much to Valentin Bartenev for his valuable feedback! Here are the changes in this version:

1. We use "total CPU threads" instead of "active CPU threads" to calculate the number of duplicated listeners. 
2. We modify some of the code to meet the style requirements listed at http://nginx.org/en/docs/contributing_changes.html.
 
The way we created the patch below is: we clone the code from http://hg.nginx.org/nginx, modified 4 files and then committed the changes. After that, we export the patch and attached here. Please let me know if there is anything missing here. This is the first time we prototype the patch for Nginx. Your comments and feedback are highly appreciated.

Thanks,
Yingqi

# HG changeset patch
# User Yingqi Lu <Yingqi.Lu at intel.com>
# Date 1411248953 25200
#      Sat Sep 20 14:35:53 2014 -0700
# Node ID 04abfbb10a7f5e4efd5187d1382a880483a79294
# Parent  43512a33e8f232fb6f256b90e76ac4f1472a0c1f
Added SO_REUSEPORT support.

It duplicates and configures certain number of listen sockets in the master process with SO_REUSEPORT enabled. All the worker processes can inherit them.
The number of the listen sockets to be duplicated is calculated based on the number of total CPU threads. With big system that has more CPU threads, more duplicated listen sockets created to improve the throughput and scalability.
With system that has only 8 or less CPU threads, there will be only 1 listen socket. Duplicated listen sockets are only being created when necessary. In case that SO_REUSEPORT is not supported by the OS, it falls back default behavior.

diff -r 43512a33e8f2 -r 04abfbb10a7f src/core/ngx_connection.c
--- a/src/core/ngx_connection.c	Wed Aug 13 15:11:45 2014 +0400
+++ b/src/core/ngx_connection.c	Sat Sep 20 14:35:53 2014 -0700
@@ -304,7 +304,7 @@
 ngx_int_t
 ngx_open_listening_sockets(ngx_cycle_t *cycle)  {
-    int               reuseaddr;
+    int               reuseaddr, reuseport;
     ngx_uint_t        i, tries, failed;
     ngx_err_t         err;
     ngx_log_t        *log;
@@ -312,6 +312,7 @@
     ngx_listening_t  *ls;
 
     reuseaddr = 1;
+    reuseport = 1;
 #if (NGX_SUPPRESS_WARN)
     failed = 0;
 #endif
@@ -370,6 +371,24 @@
                 return NGX_ERROR;
             }
 
+            if (ngx_so_reuseport_enabled)
+            {
+                if (setsockopt(s, SOL_SOCKET, SO_REUSEPORT,
+                               (const void *) &reuseport, sizeof(int))
+                    == -1) {
+                    ngx_log_error(NGX_LOG_EMERG, log, ngx_socket_errno,
+                                  "setsockopt(SO_REUSEPORT) %V failed",
+                                  &ls[i].addr_text);
+                    if (ngx_close_socket(s) == -1) {
+                        ngx_log_error(NGX_LOG_EMERG, log, ngx_socket_errno,
+                                      ngx_close_socket_n " %V failed",
+                                      &ls[i].addr_text);
+                    }
+
+                    return NGX_ERROR;
+                }
+            }
+
 #if (NGX_HAVE_INET6 && defined IPV6_V6ONLY)
 
             if (ls[i].sockaddr->sa_family == AF_INET6) { diff -r 43512a33e8f2 -r 04abfbb10a7f src/core/ngx_cycle.c
--- a/src/core/ngx_cycle.c	Wed Aug 13 15:11:45 2014 +0400
+++ b/src/core/ngx_cycle.c	Sat Sep 20 14:35:53 2014 -0700
@@ -26,6 +26,9 @@
 ngx_uint_t             ngx_test_config;
 ngx_uint_t             ngx_quiet_mode;
 
+ngx_uint_t             ngx_so_reuseport_enabled;
+ngx_uint_t             ngx_num_dup_sockets;
+
 #if (NGX_THREADS)
 ngx_tls_key_t          ngx_core_tls_key;
 #endif
@@ -54,7 +57,39 @@
     ngx_core_conf_t     *ccf, *old_ccf;
     ngx_core_module_t   *module;
     char                 hostname[NGX_MAXHOSTNAMELEN];
+    ngx_uint_t           num_cores, taken;
+    ngx_socket_t         temp_s;
+    int                  one = 1;
 
+    /* check if SO_REUSEPORT feature is enabled */
+    temp_s = ngx_socket(AF_INET, SOCK_STREAM, 0);
+
+#ifndef SO_REUSEPORT
+#define SO_REUSEPORT 15
+#endif
+    if (setsockopt(temp_s, SOL_SOCKET, SO_REUSEPORT,
+                  (const void *) &one, sizeof(int)) == 0) {
+        ngx_so_reuseport_enabled = 1;
+    } else {
+        ngx_so_reuseport_enabled = 0;
+    }
+    ngx_close_socket(temp_s);
+
+    if (ngx_so_reuseport_enabled) {
+#ifdef _SC_NPROCESSORS_ONLN
+        num_cores = sysconf(_SC_NPROCESSORS_CONF); #else
+        num_cores = 1;
+#endif
+        if (num_cores > 8) {
+            ngx_num_dup_sockets = num_cores/8;
+        } else {
+            ngx_num_dup_sockets = 1;
+        }
+    } else {
+        ngx_num_dup_sockets = 1;
+    }
+ 
     ngx_timezone_update();
 
     /* force localtime update with a new timezone */ @@ -114,7 +149,8 @@
     }
 
 
-    n = old_cycle->paths.nelts ? old_cycle->paths.nelts : 10;
+    n = old_cycle->paths.nelts ? old_cycle->paths.nelts :
+        10 * ngx_num_dup_sockets;
 
     cycle->paths.elts = ngx_pcalloc(pool, n * sizeof(ngx_path_t *));
     if (cycle->paths.elts == NULL) {
@@ -164,7 +200,8 @@
         return NULL;
     }
 
-    n = old_cycle->listening.nelts ? old_cycle->listening.nelts : 10;
+    n = old_cycle->listening.nelts ? old_cycle->listening.nelts :
+        10 * ngx_num_dup_sockets;
 
     cycle->listening.elts = ngx_pcalloc(pool, n * sizeof(ngx_listening_t));
     if (cycle->listening.elts == NULL) { @@ -231,7 +268,8 @@
 
     ngx_memzero(&conf, sizeof(ngx_conf_t));
     /* STUB: init array ? */
-    conf.args = ngx_array_create(pool, 10, sizeof(ngx_str_t));
+    conf.args = ngx_array_create(pool,
+                                 10 * ngx_num_dup_sockets, 
+ sizeof(ngx_str_t));
     if (conf.args == NULL) {
         ngx_destroy_pool(pool);
         return NULL;
@@ -486,6 +524,7 @@
         }
 
         nls = cycle->listening.elts;
+        taken = 0;
         for (n = 0; n < cycle->listening.nelts; n++) {
 
             for (i = 0; i < old_cycle->listening.nelts; i++) { @@ -493,9 +532,9 @@
                     continue;
                 }
 
-                if (ngx_cmp_sockaddr(nls[n].sockaddr, nls[n].socklen,
+                if ((ngx_cmp_sockaddr(nls[n].sockaddr, nls[n].socklen,
                                      ls[i].sockaddr, ls[i].socklen, 1)
-                    == NGX_OK)
+                    == NGX_OK) && i >= taken)
                 {
                     nls[n].fd = ls[i].fd;
                     nls[n].previous = &ls[i]; @@ -540,6 +579,7 @@
                         nls[n].add_deferred = 1;
                     }
 #endif
+                    taken = i + 1;
                     break;
                 }
             }
@@ -747,7 +787,7 @@
             exit(1);
         }
 
-        n = 10;
+        n = 10 * ngx_num_dup_sockets;
         ngx_old_cycles.elts = ngx_pcalloc(ngx_temp_pool,
                                           n * sizeof(ngx_cycle_t *));
         if (ngx_old_cycles.elts == NULL) { diff -r 43512a33e8f2 -r 04abfbb10a7f src/core/ngx_cycle.h
--- a/src/core/ngx_cycle.h	Wed Aug 13 15:11:45 2014 +0400
+++ b/src/core/ngx_cycle.h	Sat Sep 20 14:35:53 2014 -0700
@@ -136,6 +136,8 @@
 extern ngx_module_t           ngx_core_module;
 extern ngx_uint_t             ngx_test_config;
 extern ngx_uint_t             ngx_quiet_mode;
+extern ngx_uint_t             ngx_so_reuseport_enabled;
+extern ngx_uint_t             ngx_num_dup_sockets;
 #if (NGX_THREADS)
 extern ngx_tls_key_t          ngx_core_tls_key;
 #endif
diff -r 43512a33e8f2 -r 04abfbb10a7f src/http/ngx_http.c
--- a/src/http/ngx_http.c	Wed Aug 13 15:11:45 2014 +0400
+++ b/src/http/ngx_http.c	Sat Sep 20 14:35:53 2014 -0700
@@ -1671,7 +1671,7 @@
 static ngx_int_t
 ngx_http_init_listening(ngx_conf_t *cf, ngx_http_conf_port_t *port)  {
-    ngx_uint_t                 i, last, bind_wildcard;
+    ngx_uint_t                 i, j, last, bind_wildcard;
     ngx_listening_t           *ls;
     ngx_http_port_t           *hport;
     ngx_http_conf_addr_t      *addr;
@@ -1703,40 +1703,42 @@
             continue;
         }
 
-        ls = ngx_http_add_listening(cf, &addr[i]);
-        if (ls == NULL) {
-            return NGX_ERROR;
-        }
-
-        hport = ngx_pcalloc(cf->pool, sizeof(ngx_http_port_t));
-        if (hport == NULL) {
-            return NGX_ERROR;
-        }
-
-        ls->servers = hport;
-
-        if (i == last - 1) {
-            hport->naddrs = last;
-
-        } else {
-            hport->naddrs = 1;
-            i = 0;
-        }
-
-        switch (ls->sockaddr->sa_family) {
-
-#if (NGX_HAVE_INET6)
-        case AF_INET6:
-            if (ngx_http_add_addrs6(cf, hport, addr) != NGX_OK) {
+        for(j = 0; j < ngx_num_dup_sockets; j++) {
+            ls = ngx_http_add_listening(cf, &addr[i]);
+            if (ls == NULL) {
                 return NGX_ERROR;
             }
-            break;
-#endif
-        default: /* AF_INET */
-            if (ngx_http_add_addrs(cf, hport, addr) != NGX_OK) {
+
+            hport = ngx_pcalloc(cf->pool, sizeof(ngx_http_port_t));
+            if (hport == NULL) {
                 return NGX_ERROR;
             }
-            break;
+
+            ls->servers = hport;
+
+            if (i == last - 1) {
+                hport->naddrs = last;
+
+            } else {
+                hport->naddrs = 1;
+                i = 0;
+            }
+
+            switch (ls->sockaddr->sa_family) {
+
+#if (NGX_HAVE_INET6)
+            case AF_INET6:
+                if (ngx_http_add_addrs6(cf, hport, addr) != NGX_OK) {
+                    return NGX_ERROR;
+                }
+                break;
+#endif
+            default: /* AF_INET */
+                if (ngx_http_add_addrs(cf, hport, addr) != NGX_OK) {
+                    return NGX_ERROR;
+                }
+                break;
+            }
         }
 
         addr++;

-----Original Message-----
From: nginx-devel-bounces at nginx.org [mailto:nginx-devel-bounces at nginx.org] On Behalf Of Lu, Yingqi
Sent: Friday, September 19, 2014 10:34 AM
To: nginx-devel at nginx.org
Subject: RE: [Patch] SO_REUSEPORT support from master process

Hi Valentin,

I just tested the exact case you mentioned and you are right! There is a connection drop, but it got recovered really quickly. The new connections will only go to the remaining listen sockets on a round robin fashion after the reload.

I will change the check of "active cpu" to "all cpu". This will make the number of listen sockets a constant as long as the system remain active. Thank you very much for your hint!

I will also check the coding style again and resend the patch.

Thanks,
Yingqi

-----Original Message-----
From: nginx-devel-bounces at nginx.org [mailto:nginx-devel-bounces at nginx.org] On Behalf Of Valentin V. Bartenev
Sent: Friday, September 19, 2014 9:48 AM
To: nginx-devel at nginx.org
Subject: Re: [Patch] SO_REUSEPORT support from master process

On Friday 19 September 2014 15:52:35 Lu, Yingqi wrote:
> Hi Valentin,
> 
> Thanks very much for your email.
> 
> Regarding to your first question, I did following test:
> 1. Start Nginx
> 2. Offline half of the CPU threads by "echo 0 > 
> /sys/devices/system/cpu/"cpu#"/online
> 3. Reload Nginx and check the worker process, the number of listen 
> sockets is actually reduced to half.
> 4. Then, I put back the offlined CPU threads back online, and reload the Nginx again.
> 5. The new worker process now has 2X listen sockets as the CPU threads 
> doubled since the previous reload.
> 
> Every time Nginx reloads, my understanding is it will run Ngx_init_cycle. CPU_num_check is  > there. Is above testing the correct way to check if reload works? Please let me know if I > missed anything.
[..]

What happens if you shutdown half of your cpus again, and try to reload nginx under high load?

As I understand right, with your patch it will close half of listen sockets, and all the connections that have been waiting to be accepted on these sockets will be lost.

  wbr, Valentin V. Bartenev

_______________________________________________
nginx-devel mailing list
nginx-devel at nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx-devel

_______________________________________________
nginx-devel mailing list
nginx-devel at nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx-devel

_______________________________________________
nginx-devel mailing list
nginx-devel at nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx-devel

_______________________________________________
nginx-devel mailing list
nginx-devel at nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx-devel



More information about the nginx-devel mailing list