Recovering from partial writes

Fri Jun 22 18:40:39 UTC 2018

I have an nginx proxy through which clients pass a large POST payload to 
the upstream server. Sometimes, the upstream server is slow and so 
writing the POST data will fail with a writev() not ready (EAGAIN) 
error. But of course, that's a very common situation when dealing with 
non-blocking I/O, and I'd expect the rest of the data to be written when 
the socket is again ready for writing.

In fact, it seems like the basic structure of that is in place; when 
ngx_writev gets the EAGAIN, it passes that to calling functions, which 
modify the chain buffers. Yet somewhere along the line (seemingly in 
ngx_http_upstream_send_request_body) the partially-written buffer is 
freed, and although the socket later indicates that it is ready to write 
(and the ngx epoll module does detect that), there is no longer any data 
to write and so everything fails.

I realize this is not the dev mailing list so an answer to how that is 
programmed isn't necessarily what I'm after -- again, the partial write 
of data to a socket is such a common thing that I can't think I'm the 
first to encounter it and find a basic bug, so I assume that something 
else is going on. I have tried this with proxy_request_buffering off and 
on, and the failure is essentially the same. The http section of my conf 
looks like this:

http {
     max_ranges 1;
     #map $http_accept $file_extension {
     #   default   ".html";
     #    "~*json"  ".json";
     #}
     map $http_upgrade $connection_upgrade {
         default upgrade;
         '' "";
     }
     server_names_hash_bucket_size 512;
     server_names_hash_max_size 2048;
     variables_hash_bucket_size 512;
     variables_hash_max_size 2048;
     client_header_buffer_size 8k;
     large_client_header_buffers 4 16k;
     proxy_buffering off;
     proxy_request_buffering off; # Tried on, and various sizes
     #proxy_buffer_size 16k;
     #proxy_buffers 4 128k;
     #proxy_busy_buffers_size 256k;
     #proxy_headers_hash_bucket_size 256;
     client_max_body_size 0;
     ssl_session_cache shared:SSL:20m;
     ssl_session_timeout 60m;

     include       /u01/data/config/nginx/mime.types;
     default_type  application/octet-stream;

     log_format  main  '"$remote_addr" "-" "$remote_user" "[$time_local]" "$request" '
                       '"$status" "$body_bytes_sent" "$http_referer" '
                       '"$http_user_agent" "$http_x_forwarded_for"';

     log_format  opcroutingtier  '"$remote_addr" "-" "$remote_user" [$time_local] "$request" "$status" '
                                 '"$body_bytes_sent" "$http_referer" "$http_user_agent" "$bytes_sent" "$request_length" "-" '
                                 '"$host" "$http_x_forwarded_for" "$server_name" "$server_port" "$request_time" "$upstream_addr" '
                                 '"$upstream_connect_time" "$upstream_header_time" "$upstream_response_time" "$upstream_status" "$ssl_cipher" "$ssl_protocol" '
                                 '"-" "-" "-"';

     access_log  /u01/data/logs/nginx_logs/access_logs/access.log  opcroutingtier;
     sendfile        off;  # also tried on
     keepalive_timeout 60s;
     keepalive_requests 2000000;
     open_file_cache max=2000 inactive=20s;
     open_file_cache_valid 60s;
     open_file_cache_min_uses 5;
     open_file_cache_errors off;
     gzip on;
     gzip_types text/plain text/css text/javascript text/xml application/x-javascript application/xml;
     gzip_min_length 500;
     gzip_comp_level 7;

Everything works fine if the upstream reads data fast enough; it's only 
when nginx gets a partial write upstream that there is a problem. Am I 
missing something here?

-Scott