SSL_shutdown() failed (SSL: ... bad write retry)
Maxim Dounin
mdounin at mdounin.ru
Wed Aug 19 16:50:54 UTC 2020
Hello!
On Tue, Aug 18, 2020 at 05:35:16AM -0400, vergil wrote:
> Maxim Dounin Wrote:
> -------------------------------------------------------
> > Hello!
> >
> > On Fri, Aug 14, 2020 at 10:34:49AM -0400, vergil wrote:
> >
> > > Maxim Dounin Wrote:
> > > -------------------------------------------------------
> > > > Hello!
> > > >
> > > > On Thu, Aug 13, 2020 at 12:11:54PM -0400, vergil wrote:
> > > >
> > > > > Maxim Dounin Wrote:
> > > > > -------------------------------------------------------
> > > > > > Hello!
> > > > > >
> > > > > > On Thu, Aug 13, 2020 at 11:39:36AM -0400, vergil wrote:
> > > > > >
> > > > > > > This one was hard to catch.
> > > > > > >
> > > > > > > I've captured one error with 30 seconds delta before and
> > after
> > > > the
> > > > > > event.
> > > > > > > Where can i attach log file for you? There's 400K messages,
> > so i
> > > > > > cannot
> > > > > > > simple put it here.
> > > > > >
> > > > > > Attaching the log to the message into the mailing list should
> > > > > > work, but I'm not sure it's supported by the (obsolete) forum
> > > > > > interface you are using. If not, you may put the log at a
> > > > > > convinient place and provide a link here, or attach it to a
> > > > > > ticket on trac.nginx.org, or email to me privetely.
> > > > >
> > > > > I've attached log file to our S3 public storage. You can
> > download it
> > > > through
> > > > > this link:
> > > > >
> > > >
> > https://drive-public-eu.s3.eu-central-1.amazonaws.com/nginx/nginx-debu
> > > > g.csv
> > > > >
> > > > > A note: this is a CSV format from our logging system. I can try
> > to
> > > > extract
> > > > > logs in original format if you need.
> > > >
> > > > Thanks, but this doesn't seem to contain anything related to the
> > > > SSL_shutdown() except the message itself:
> > > >
> > > > "2020-08-13T15:19:03.279Z","7","shmtx lock",
> > > > "2020-08-13T15:19:03.279Z","7","shmtx lock",
> > > > "2020-08-13T15:19:03.279Z","7","timer delta: 0",
> > > > "2020-08-13T15:19:03.280Z","2","SSL_shutdown() failed (SSL:
> > > > error:1409F07F:SSL routines:ssl3_write_pending:bad write retry)
> > while
> > > > closing request, client: XXX.XXX.XXX.XXX, server:
> > > > XXX.XXX.XXX.XXX:443","9140"
> > > > "2020-08-13T15:19:03.280Z","7","epoll: fd:322 ev:0005
> > > > d:00007F0A0FCDDEB0",
> > > > "2020-08-13T15:19:03.280Z","7","epoll: fd:54 ev:0004
> > > > d:00007F0A0FCDFAC9",
> > > >
> > > > And nothing else in the log saying "SSL_shutdow()", while there
> > > > should be a debug messages like "SSL_shutdown: -1" and
> > > > "SSL_get_error: ..." right before the message, and nothing at all
> > > > related to the connection 9140.
> > > >
> > > > It looks like the debug logging is only enabled on the global
> > > > level, but disabled at http or server level. Please see the part
> > > > starting at "Note that redefining the log without also specifying
> > > > the debug level will disable the debugging log" in the "A
> > > > debugging log" article
> > > > (http://nginx.org/en/docs/debugging_log.html).
> > >
> > > So... Bad news: i cannot capture the event when full debug enabled.
> > Server
> > > cannot handle the load and our service partially down at that time.
> > >
> > > What can i say is that this problem reveal itself on all servers
> > with new
> > > nginx version.
> > >
> > > I'll send you the link privately where you can get our
> > (some-redacted)
> > > config files.
> >
> > Thank you for your efforts. Just in case, it is possible to
> > configure debug logging only for parts of the load - using the
> > debug_connection directive with large networks in CIDR notation
> > (http://nginx.org/r/debug_connection). It's probably not needed
> > in this particular case, see below.
> >
> > I was able to reporoduce an "SSL_shutdown() failed (SSL: ... bad
> > write retry)" error at least in one case, similar to the one
> > previously observed with SSL_write() in
> > https://trac.nginx.org/nginx/ticket/1194. Previously, this case
> > wasn't causing SSL_shutdown() errors, but SSL shutdown fix
> > introduced in nginx 1.19.2 revealed the problem.
> >
> > The following patch should fix this. It was discussed previously
> > as a possible fix for other SSL_shutdown() errors fixed in 1.19.2,
> > but wasn't commited as there were concerns it will effectively
> > disable SSL shutdown in some unrelated cases where c->error flag
> > is misused. Now it is more or less clear that the change is
> > needed.
> >
> > Patch (it would be great if you'll be able to test if it fixes the
> > problem for you):
> >
> > # HG changeset patch
> > # User Maxim Dounin <mdounin at mdounin.ru>
> > # Date 1597531639 -10800
> > # Sun Aug 16 01:47:19 2020 +0300
> > # Node ID be7a3155e00161baf7359ffa73a3a226f1e487c9
> > # Parent 7d46c9f56c9afe34a38bb3aea99550a2fd884280
> > SSL: disabled shutdown after connection errors.
> >
> > This fixes ""SSL_shutdown() failed (SSL: ... bad write retry)" errors
> > as observed on the second SSL_shutdown() call after SSL shutdown fixes
> > in
> > 09fb2135a589 (1.19.2), notably when sending fails in
> > ngx_http_test_expect(),
> > similarly to ticket 1194.
> >
> > Note that there are some places where c->error is misused to prevent
> > further output, such as ngx_http_v2_finalize_connection() if there
> > are pending streams, or in filter finalization. These places seem
> > to be extreme enough to don't care about missing shutdown though.
> > For example, filter finalization currently prevents keepalive from
> > being used.
> >
> > diff --git a/src/event/ngx_event_openssl.c
> > b/src/event/ngx_event_openssl.c
> > --- a/src/event/ngx_event_openssl.c
> > +++ b/src/event/ngx_event_openssl.c
> > @@ -2793,7 +2793,7 @@ ngx_ssl_shutdown(ngx_connection_t *c)
> > return NGX_OK;
> > }
> >
> > - if (c->timedout) {
> > + if (c->timedout || c->error) {
> > mode = SSL_RECEIVED_SHUTDOWN|SSL_SENT_SHUTDOWN;
> > SSL_set_quiet_shutdown(c->ssl->connection, 1);
> >
> >
>
> Good day.
>
> Patch mentioned above solved half the problems.
>
> SSL_shutdown() failed (SSL: error:1409F07F:SSL
> routines:ssl3_write_pending:bad write retry) while processing HTTP/2
> connection
>
> Still remains in the logs.
Do you see any other errors on the same connection before the
SSL_shutdown() error? As suggested previously, somethig relevant
might be logged at the "info" level. Note that seeing info-level
error messages will probably require error logging to be
reconfigured, much like with debug.
If there is nothing, I'm afraid the only solution would be to try
to catch a debugging log related to these errors. As previously
suggested, this can be done without too much load by using the
debug_connection with relatively large CIDR blocks and waiting for
the error to happen from with a client from one of these blocks.
--
Maxim Dounin
http://mdounin.ru/
More information about the nginx
mailing list