One last try - large long-running worker tasks

Jeff Heisz jmheisz at gmail.com
Tue Nov 10 13:34:03 UTC 2020


Thanks Sergey, that's kind of where I was heading and I've looked at
those, but this is where I get tripped up.  Unlike the 'real'
upstream-dependent instances (which are reading from an upstream
socket instance and can process incoming data in the main event loop,
no locking required), the processing in my module is a bunch of code
doing direct API accesses (no file descriptors exposed) in the task
handler method and generating content.  The final response for the
smaller cases is written in the completion handler.

But because that worker task is running in a separate thread, there's
no safe mechanism I can see to "start" the response and then stream
content from the task handler and the main outbound HTTP response
writer.  What would be ideal (but doesn't exist in nginx core today)
is an 'always' callback in the event loop, in which I could manage the
streaming, but a post task can't insert another post task for the next
loop, it'll get processed (I guess I could use a 0 millisecond timer,
but ugly).  This is why I was looking at a kernel pipe, I could use
that to simulate the upstream, even writing data through it and the
pipe would enable data transfer between the task thread and the main
event thread for responding...

jmh

On Tue, Nov 10, 2020 at 7:48 AM Dipl. Ing. Sergey Brester
<serg.brester at sebres.de> wrote:
>
> You could do it similar proxy module is buffering the response, for instance see proxy_buffering directive:
>
> When buffering is enabled, nginx receives a response from the proxied server as soon as possible, saving it into the buffers set by the proxy_buffer_size and proxy_buffers directives. If the whole response does not fit into memory, a part of it can be saved to a temporary file on the disk. Writing to temporary files is controlled by the proxy_max_temp_file_size and proxy_temp_file_write_size directives.
>
> This or other communicating modules (like fcgi, scgi or uwsgi) using upstream buffering of response. The handling around buffering of upstream is almost the same in all modules.
> This is already event-driven - handler is called on readable, by incoming response chunk (or on writable of downstream).
>
> Basically depending on how your module architecture is built, you could:
>
> either use default upstream buffering mechanism (if you have something like upstream or can simulate that). In thin case you have to set certain properties of r->upstream: buffering, buffer_size, bufs.num and bufs.size, temp_file_write_size and max_temp_file_size and of course register the handler reading the upstream pipe.
> or organize your own response buffering as it is implemented in ngx_event_pipe.c and ngx_http_upstream.c, take a look there for implementation details.
>
> As for performance (disk I/O, etc) - it depends (buffer size, system cache, mount type of temp storage, speed of clients downstream, etc). But if you would configure the buffers large enough, nginx could use it as long as possible and the storing in temp file can be considered as safe on demand fallback to smooth out the peak of load, to avoid OOM situation.
> Usage a kernel pipe buffers could be surely faster, but indirect you'd just relocate the potential OOM issue from nginx process to the system.
>
> Regards,
> Sergey
>
> 10.11.2020 02:54, Jeff Heisz wrote:
>
> Hi all, I've asked this before with no response, trying one last time
> before I just make something work.
>
> I'm making a custom module for nginx that does a number of things but
> one of the actions is a long-running (in the nginx sense) task that
> could produce a large response.  I've already got proper processing
> around using worker tasks for the other long-running operations that
> have small datasets, but worry about accumulating a large amount of
> memory in a buffer chain for the response.  Ideally it would drain as
> fast as the client can consume it and throttle appropriately, there
> could conceivably be gigabytes of content.
>
> My choices (besides blowing all of the memory in the system) are:
>
> - write to a temporary file and attach a file buffer as the response,
> less than ideal as it's essentially translating a file to begin with,
> so it's a lot of disk I/O and performance will be less than stellar.
> From what I can tell, this is one of the models for the various CGI
> systems for caching, although in my case caching is not of use
>
> - somehow hook into the eventing system of nginx to detect the write
> transitions and implement flow control directly using threading
> conditionals.  I've tried this for a few weeks but can't figure out
> the 'right' thing to make the hooks work in a separate module without
> changing the core nginx code, which I'm loathe to do (unless you are
> looking for someone to contribute such a solution, but I'd probably
> need some initial guidance)
>
> - attach a kernel pipe object (yah yah, won't work on Windows, don't
> care) to each of my worker instances and somehow 'connect' that as an
> upstream-like resource, so that the nginx event loop handles the
> read/write consumption and the thread automatically blocks when full
> on the kernel pipe.  Would need some jiggery to handle reuse and
> start/end markers.  Also not clear if I can override the connection
> model for the upstream without again changing core nginx server code
>
> Any thoughts?  Not looking for code here (although telling me to look
> at the blah-blah-blah that does exactly this would be awesome), but if
> someone who is more familiar with the inner workings of the nginx data
> flow could just say which solution is a non-starter (so I don't waste
> time trying to make it work) or even which would be a suitable
> solution would be awesome!
>
> jmh
> _______________________________________________
> nginx-devel mailing list
> nginx-devel at nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx-devel


More information about the nginx-devel mailing list