SSL_shutdown() failed (SSL: ... bad write retry)

Maxim Dounin mdounin at mdounin.ru
Sat Aug 15 22:54:46 UTC 2020


Hello!

On Fri, Aug 14, 2020 at 10:34:49AM -0400, vergil wrote:

> Maxim Dounin Wrote:
> -------------------------------------------------------
> > Hello!
> > 
> > On Thu, Aug 13, 2020 at 12:11:54PM -0400, vergil wrote:
> > 
> > > Maxim Dounin Wrote:
> > > -------------------------------------------------------
> > > > Hello!
> > > > 
> > > > On Thu, Aug 13, 2020 at 11:39:36AM -0400, vergil wrote:
> > > > 
> > > > > This one was hard to catch.
> > > > > 
> > > > > I've captured one error with 30 seconds delta before and after
> > the
> > > > event.
> > > > > Where can i attach log file for you? There's 400K messages, so i
> > > > cannot
> > > > > simple put it here.
> > > > 
> > > > Attaching the log to the message into the mailing list should 
> > > > work, but I'm not sure it's supported by the (obsolete) forum 
> > > > interface you are using.  If not, you may put the log at a 
> > > > convinient place and provide a link here, or attach it to a 
> > > > ticket on trac.nginx.org, or email to me privetely.
> > > 
> > > I've attached log file to our S3 public storage. You can download it
> > through
> > > this link:
> > >
> > https://drive-public-eu.s3.eu-central-1.amazonaws.com/nginx/nginx-debu
> > g.csv
> > > 
> > > A note: this is a CSV format from our logging system. I can try to
> > extract
> > > logs in original format if you need.
> > 
> > Thanks, but this doesn't seem to contain anything related to the 
> > SSL_shutdown() except the message itself:
> > 
> > "2020-08-13T15:19:03.279Z","7","shmtx lock",
> > "2020-08-13T15:19:03.279Z","7","shmtx lock",
> > "2020-08-13T15:19:03.279Z","7","timer delta: 0",
> > "2020-08-13T15:19:03.280Z","2","SSL_shutdown() failed (SSL:
> > error:1409F07F:SSL routines:ssl3_write_pending:bad write retry) while
> > closing request, client: XXX.XXX.XXX.XXX, server:
> > XXX.XXX.XXX.XXX:443","9140"
> > "2020-08-13T15:19:03.280Z","7","epoll: fd:322 ev:0005
> > d:00007F0A0FCDDEB0",
> > "2020-08-13T15:19:03.280Z","7","epoll: fd:54 ev:0004
> > d:00007F0A0FCDFAC9",
> > 
> > And nothing else in the log saying "SSL_shutdow()", while there 
> > should be a debug messages like "SSL_shutdown: -1" and 
> > "SSL_get_error: ..." right before the message, and nothing at all 
> > related to the connection 9140.
> > 
> > It looks like the debug logging is only enabled on the global 
> > level, but disabled at http or server level.  Please see the part 
> > starting at "Note that redefining the log without also specifying 
> > the debug level will disable the debugging log" in the "A 
> > debugging log" article 
> > (http://nginx.org/en/docs/debugging_log.html).
> 
> So... Bad news: i cannot capture the event when full debug enabled. Server
> cannot handle the load and our service partially down at that time.
> 
> What can i say is that this problem reveal itself on all servers with new
> nginx version.
> 
> I'll send you the link privately where you can get our (some-redacted)
> config files.

Thank you for your efforts.  Just in case, it is possible to 
configure debug logging only for parts of the load - using the 
debug_connection directive with large networks in CIDR notation 
(http://nginx.org/r/debug_connection).  It's probably not needed 
in this particular case, see below.

I was able to reporoduce an "SSL_shutdown() failed (SSL: ... bad 
write retry)" error at least in one case, similar to the one 
previously observed with SSL_write() in 
https://trac.nginx.org/nginx/ticket/1194.  Previously, this case 
wasn't causing SSL_shutdown() errors, but SSL shutdown fix 
introduced in nginx 1.19.2 revealed the problem.

The following patch should fix this.  It was discussed previously 
as a possible fix for other SSL_shutdown() errors fixed in 1.19.2, 
but wasn't commited as there were concerns it will effectively 
disable SSL shutdown in some unrelated cases where c->error flag 
is misused.  Now it is more or less clear that the change is 
needed.

Patch (it would be great if you'll be able to test if it fixes the 
problem for you):

# HG changeset patch
# User Maxim Dounin <mdounin at mdounin.ru>
# Date 1597531639 -10800
#      Sun Aug 16 01:47:19 2020 +0300
# Node ID be7a3155e00161baf7359ffa73a3a226f1e487c9
# Parent  7d46c9f56c9afe34a38bb3aea99550a2fd884280
SSL: disabled shutdown after connection errors.

This fixes ""SSL_shutdown() failed (SSL: ... bad write retry)" errors
as observed on the second SSL_shutdown() call after SSL shutdown fixes in
09fb2135a589 (1.19.2), notably when sending fails in ngx_http_test_expect(),
similarly to ticket 1194.

Note that there are some places where c->error is misused to prevent
further output, such as ngx_http_v2_finalize_connection() if there
are pending streams, or in filter finalization.  These places seem
to be extreme enough to don't care about missing shutdown though.
For example, filter finalization currently prevents keepalive from
being used.

diff --git a/src/event/ngx_event_openssl.c b/src/event/ngx_event_openssl.c
--- a/src/event/ngx_event_openssl.c
+++ b/src/event/ngx_event_openssl.c
@@ -2793,7 +2793,7 @@ ngx_ssl_shutdown(ngx_connection_t *c)
         return NGX_OK;
     }
 
-    if (c->timedout) {
+    if (c->timedout || c->error) {
         mode = SSL_RECEIVED_SHUTDOWN|SSL_SENT_SHUTDOWN;
         SSL_set_quiet_shutdown(c->ssl->connection, 1);
 



-- 
Maxim Dounin
http://mdounin.ru/


More information about the nginx mailing list