[PATCH] Do not break downstream connection on proxy_cache write error
Kipras
kiprasm at gmail.com
Mon May 9 13:12:40 UTC 2022
Hello,
Thank you for the very detailed feedback.
A couple of mistakes from my side, when submitting the patch:
Also, this looks irrelevant to the issue being addressed.
Yep, this is not relevant to the issue being addressed, i think i added this
change here by mistake, my bad.
Also, i missed to add another part of the patch, which removes this block
(the
original call to ngx_write_chain_to_temp_file()):
n = ngx_write_chain_to_temp_file(p->temp_file, out);
> if (n == NGX_ERROR) {
> p->cache_write_fail = 1;
> return NGX_ABORT;
> }
I'm guessing you were referring to that block in this comment: (?)
- In case of success, the following code will write the data
> again, resulting in file corruption and likely various other
> issues.
Honestly, all the ins and outs of the function
ngx_event_pipe_write_chain_to_temp_file() are not quite clear to me. But
with
this change the issue of closing the client connection abruptly without
finishing to serve the response (when proxy cache failed to write to disk)
goes
away (at least in my tests).
However, i surely cannot guarantee that nothing else will break :)
(but fwiw i did run the nginx-tests suite with this change, and all tests
pass)
I will do more testing and try to analyze what exactly is happening in this
function.
Thank you.
P.S. if i could ask - what does the p->cacheable flag mean? The
ngx_event_pipe_t
struct is not documented in
http://nginx.org/en/docs/dev/development_guide.html
and it is kinda hard to understand all the pipes/buffers and the buffer
shadowing mechanism/purpose just from reading the code :)
On Tue, May 3, 2022 at 12:02 AM Maxim Dounin <mdounin at mdounin.ru> wrote:
> Hello!
>
> On Mon, May 02, 2022 at 04:36:09PM +0300, Kipras Mancevicius wrote:
>
> > # HG changeset patch
> > # User Kipras Mancevicius <kiprasm at gmail.com>
> > # Date 1651495240 -10800
> > # Mon May 02 15:40:40 2022 +0300
> > # Branch proxy_cache_write_disk_full_fix
> > # Node ID e6e708fd7958987027f6de7748d948e5015ed32b
> > # Parent a736a7a613ea6e182ff86fbadcb98bb0f8891c0b
> > Do not break downstream connection on proxy_cache write error
> >
> > Continue sending response to downstream connection and delete temp cache
> file on
> > upstream connection finalization.
> >
> > diff -r a736a7a613ea -r e6e708fd7958 src/core/ngx_file.c
> > --- a/src/core/ngx_file.c Tue Feb 08 17:35:27 2022 +0300
> > +++ b/src/core/ngx_file.c Mon May 02 15:40:40 2022 +0300
> > @@ -765,6 +765,13 @@
> > ngx_delete_file_n " \"%s\" failed", name);
> >
> > }
> > + } else {
> > + /*
> > + * if the file failed to fully copy (e.g. target disk is
> full) - it
> > + * may remain (partially written) in the target disk.
> Attempt to
> > + * delete it.
> > + */
> > + ngx_delete_file(name);
>
> Trying to delete a file "just in case" looks clearly wrong. If at
> all, this is to be done in ngx_copy_file(), the function which
> created the file and knows what in fact happened. Further, it is
> not clear if it's to be done at all: in many practical cases an
> incomplete file is better than no file at all, and this is a
> generic function.
>
> Also, this looks irrelevant to the issue being addressed.
>
> > }
> >
> > ngx_free(name);
> > diff -r a736a7a613ea -r e6e708fd7958 src/event/ngx_event_pipe.c
> > --- a/src/event/ngx_event_pipe.c Tue Feb 08 17:35:27 2022 +0300
> > +++ b/src/event/ngx_event_pipe.c Mon May 02 15:40:40 2022 +0300
> > @@ -753,10 +753,21 @@
> > out = p->writing;
> > p->writing = NULL;
> >
> > - n = ngx_write_chain_to_temp_file(p->temp_file, NULL);
> > + /*
> > + * stop writing to cache on cache write fail, but keep sending
> response
> > + * downstream. Incomplete temp cache file will be deleted later.
> > + */
> > + if (p->cache_write_fail) {
> > + return NGX_BUSY;
> > + } else {
> > + n = ngx_write_chain_to_temp_file(p->temp_file, NULL);
>
> This looks terribly wrong: if this code will happen actually
> return NGX_BUSY, a thread-based operation won't be completed,
> potentially resulting in a connection hang and/or a socket leak.
>
> >
> > - if (n == NGX_ERROR) {
> > - return NGX_ABORT;
> > + if (n == NGX_ERROR) {
> > + p->cache_write_fail = 1;
> > + p->cacheable = 0;
> > +
> > + return NGX_BUSY;
>
> This looks wrong: if an error happens as a result of a
> thread-based operation, it is incorrect to just ignore this. Data
> being written needs to be recovered somehow.
>
> > + }
> > }
> >
> > goto done;
> > @@ -777,6 +788,23 @@
> > out = p->in;
> > }
> >
> > + /*
> > + * stop writing to cache on cache write fail, but keep sending
> response
> > + * downstream. Incomplete temp cache file will be deleted later.
> > + */
> > + if (p->cache_write_fail) {
> > + return NGX_BUSY;
> > + } else {
> > + n = ngx_write_chain_to_temp_file(p->temp_file, out);
> > +
> > + if (n == NGX_ERROR) {
> > + p->cache_write_fail = 1;
> > + p->cacheable = 0;
> > +
> > + return NGX_BUSY;
> > + }
> > + }
> > +
>
> This also looks wrong:
>
> - The code tries to write everything in out, ignoring things like
> temp_file_write_size and max_temp_file_size.
>
> - In case of success, the following code will write the data
> again, resulting in file corruption and likely various other
> issues.
>
> Further, the following writes (which are the only writes
> meaningful for the current request, as you don't try to update
> p->out) are not really protected by any error checking, so the
> initial objective of the patch does not seem to be achieved.
>
> Please also note that the ticket in question says "without
> introducing unreasonable complexity in the source code" for a
> reason.
>
> Hope this helps.
>
> --
> Maxim Dounin
> http://mdounin.ru/
> _______________________________________________
> nginx-devel mailing list -- nginx-devel at nginx.org
> To unsubscribe send an email to nginx-devel-leave at nginx.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx-devel/attachments/20220509/8473d235/attachment.htm>
More information about the nginx-devel
mailing list