Streaming responses from long running requests without descriptors?

Wed May 13 04:18:37 UTC 2020

Let's start with some background.  I'm developing a module to run
inside the NGINX engine to process requests (standard content phase)
against an API (not an upstream-compatible socket-based protocol).
These requests could be long-running so the module is using thread
tasks to execute and generate responses.  I'm well aware of how to
safely manage content in multi-threaded environments and the
threading/pooling/data management constraints of the NGINX internals.
For the most part, I've resolved all of the nuances of handling
"small" requests and responses.  However, the current code handles
responses by accumulating a buffer chain of content to be issued with
output_filter and finalize_request - which is less than optimal for
really large responses which can result from certain parts of the API.

Now for the question - I'm looking for some sample code or basic
direction on how to best implement a streaming consumption model to
both avoid memory overload and provide write-based throttling from the
threaded API processing to the HTTP response instance.  Of course,
there are the obvious threading boundaries where the task instance
cannot call response methods on the request and I know how to handle
inter-thread processing/messaging.  But what I can't figure out is how
to pool/interact with such a transfer mechanism from the main event
thread.  Three thoughts/possible solutions:

- overload the write event handler in the long-running circumstance to
check/consume available chain buffers from the thread processor, with
some kind of signalling/hold mechanism to throttle the thread and
avoid memory overload.  Two issues with this (so far) - how to capture
write backlog to throttle the thread but more importantly, how to
handle the edge-transition model of the underlying event loop to cause
the write handler to actually fire to check for pending records.  This
(to me) is the 'cleanest' model that I've tried to implement but have
been unable to properly get the write handler to fire consistently

 - use a continuous/never-ending 1ms timer to simulate an
every-event-loop-callback to process backlog, which is even less
elegant, it does fire consistently but creates a delay and still has
the blocking measurement issues of the previous solution, not to
mention having a single callback manage a collection of responses

- the third option, which I originally didn't really like for several
reasons but now that I've typed all this out think might be the best
solution, is to bind a kernel pipe and use that to shuttle the content
between the thread and the event loop, as if it were an external
upstream.  Was originally thinking of using the pipe to 'wake' the
event thread to consume the records but now realizing that I could
just use the pipe to shuttle all of the content.  The only thing I
don't like about this (aside from it seeming to be a much-too-complex
solution to the problem) is the possible overload of pipes for a lot
of concurrent requests.

Any thoughts?  Not looking for someone to provide the code (although
if someone says 'hey, check out this module, it does exactly what you
need' I won't be upset).  Just looking for some guidance from anyone
who might have some ideas on how to best approach this problem...

jmh

-