SSL_shutdown() failed (SSL: ... bad write retry)

Maxim Dounin mdounin at mdounin.ru
Wed Aug 19 16:50:54 UTC 2020


Hello!

On Tue, Aug 18, 2020 at 05:35:16AM -0400, vergil wrote:

> Maxim Dounin Wrote:
> -------------------------------------------------------
> > Hello!
> > 
> > On Fri, Aug 14, 2020 at 10:34:49AM -0400, vergil wrote:
> > 
> > > Maxim Dounin Wrote:
> > > -------------------------------------------------------
> > > > Hello!
> > > > 
> > > > On Thu, Aug 13, 2020 at 12:11:54PM -0400, vergil wrote:
> > > > 
> > > > > Maxim Dounin Wrote:
> > > > > -------------------------------------------------------
> > > > > > Hello!
> > > > > > 
> > > > > > On Thu, Aug 13, 2020 at 11:39:36AM -0400, vergil wrote:
> > > > > > 
> > > > > > > This one was hard to catch.
> > > > > > > 
> > > > > > > I've captured one error with 30 seconds delta before and
> > after
> > > > the
> > > > > > event.
> > > > > > > Where can i attach log file for you? There's 400K messages,
> > so i
> > > > > > cannot
> > > > > > > simple put it here.
> > > > > > 
> > > > > > Attaching the log to the message into the mailing list should 
> > > > > > work, but I'm not sure it's supported by the (obsolete) forum 
> > > > > > interface you are using.  If not, you may put the log at a 
> > > > > > convinient place and provide a link here, or attach it to a 
> > > > > > ticket on trac.nginx.org, or email to me privetely.
> > > > > 
> > > > > I've attached log file to our S3 public storage. You can
> > download it
> > > > through
> > > > > this link:
> > > > >
> > > >
> > https://drive-public-eu.s3.eu-central-1.amazonaws.com/nginx/nginx-debu
> > > > g.csv
> > > > > 
> > > > > A note: this is a CSV format from our logging system. I can try
> > to
> > > > extract
> > > > > logs in original format if you need.
> > > > 
> > > > Thanks, but this doesn't seem to contain anything related to the 
> > > > SSL_shutdown() except the message itself:
> > > > 
> > > > "2020-08-13T15:19:03.279Z","7","shmtx lock",
> > > > "2020-08-13T15:19:03.279Z","7","shmtx lock",
> > > > "2020-08-13T15:19:03.279Z","7","timer delta: 0",
> > > > "2020-08-13T15:19:03.280Z","2","SSL_shutdown() failed (SSL:
> > > > error:1409F07F:SSL routines:ssl3_write_pending:bad write retry)
> > while
> > > > closing request, client: XXX.XXX.XXX.XXX, server:
> > > > XXX.XXX.XXX.XXX:443","9140"
> > > > "2020-08-13T15:19:03.280Z","7","epoll: fd:322 ev:0005
> > > > d:00007F0A0FCDDEB0",
> > > > "2020-08-13T15:19:03.280Z","7","epoll: fd:54 ev:0004
> > > > d:00007F0A0FCDFAC9",
> > > > 
> > > > And nothing else in the log saying "SSL_shutdow()", while there 
> > > > should be a debug messages like "SSL_shutdown: -1" and 
> > > > "SSL_get_error: ..." right before the message, and nothing at all 
> > > > related to the connection 9140.
> > > > 
> > > > It looks like the debug logging is only enabled on the global 
> > > > level, but disabled at http or server level.  Please see the part 
> > > > starting at "Note that redefining the log without also specifying 
> > > > the debug level will disable the debugging log" in the "A 
> > > > debugging log" article 
> > > > (http://nginx.org/en/docs/debugging_log.html).
> > > 
> > > So... Bad news: i cannot capture the event when full debug enabled.
> > Server
> > > cannot handle the load and our service partially down at that time.
> > > 
> > > What can i say is that this problem reveal itself on all servers
> > with new
> > > nginx version.
> > > 
> > > I'll send you the link privately where you can get our
> > (some-redacted)
> > > config files.
> > 
> > Thank you for your efforts.  Just in case, it is possible to 
> > configure debug logging only for parts of the load - using the 
> > debug_connection directive with large networks in CIDR notation 
> > (http://nginx.org/r/debug_connection).  It's probably not needed 
> > in this particular case, see below.
> > 
> > I was able to reporoduce an "SSL_shutdown() failed (SSL: ... bad 
> > write retry)" error at least in one case, similar to the one 
> > previously observed with SSL_write() in 
> > https://trac.nginx.org/nginx/ticket/1194.  Previously, this case 
> > wasn't causing SSL_shutdown() errors, but SSL shutdown fix 
> > introduced in nginx 1.19.2 revealed the problem.
> > 
> > The following patch should fix this.  It was discussed previously 
> > as a possible fix for other SSL_shutdown() errors fixed in 1.19.2, 
> > but wasn't commited as there were concerns it will effectively 
> > disable SSL shutdown in some unrelated cases where c->error flag 
> > is misused.  Now it is more or less clear that the change is 
> > needed.
> > 
> > Patch (it would be great if you'll be able to test if it fixes the 
> > problem for you):
> > 
> > # HG changeset patch
> > # User Maxim Dounin <mdounin at mdounin.ru>
> > # Date 1597531639 -10800
> > #      Sun Aug 16 01:47:19 2020 +0300
> > # Node ID be7a3155e00161baf7359ffa73a3a226f1e487c9
> > # Parent  7d46c9f56c9afe34a38bb3aea99550a2fd884280
> > SSL: disabled shutdown after connection errors.
> > 
> > This fixes ""SSL_shutdown() failed (SSL: ... bad write retry)" errors
> > as observed on the second SSL_shutdown() call after SSL shutdown fixes
> > in
> > 09fb2135a589 (1.19.2), notably when sending fails in
> > ngx_http_test_expect(),
> > similarly to ticket 1194.
> > 
> > Note that there are some places where c->error is misused to prevent
> > further output, such as ngx_http_v2_finalize_connection() if there
> > are pending streams, or in filter finalization.  These places seem
> > to be extreme enough to don't care about missing shutdown though.
> > For example, filter finalization currently prevents keepalive from
> > being used.
> > 
> > diff --git a/src/event/ngx_event_openssl.c
> > b/src/event/ngx_event_openssl.c
> > --- a/src/event/ngx_event_openssl.c
> > +++ b/src/event/ngx_event_openssl.c
> > @@ -2793,7 +2793,7 @@ ngx_ssl_shutdown(ngx_connection_t *c)
> >          return NGX_OK;
> >      }
> >  
> > -    if (c->timedout) {
> > +    if (c->timedout || c->error) {
> >          mode = SSL_RECEIVED_SHUTDOWN|SSL_SENT_SHUTDOWN;
> >          SSL_set_quiet_shutdown(c->ssl->connection, 1);
> >  
> > 
> 
> Good day.
> 
> Patch mentioned above solved half the problems.
> 
> SSL_shutdown() failed (SSL: error:1409F07F:SSL
> routines:ssl3_write_pending:bad write retry) while processing HTTP/2
> connection
> 
> Still remains in the logs.

Do you see any other errors on the same connection before the 
SSL_shutdown() error?  As suggested previously, somethig relevant 
might be logged at the "info" level.  Note that seeing info-level 
error messages will probably require error logging to be 
reconfigured, much like with debug.

If there is nothing, I'm afraid the only solution would be to try 
to catch a debugging log related to these errors.  As previously 
suggested, this can be done without too much load by using the 
debug_connection with relatively large CIDR blocks and waiting for 
the error to happen from with a client from one of these blocks.

-- 
Maxim Dounin
http://mdounin.ru/


More information about the nginx mailing list