Buffer Reuse in Nginx

Tue Mar 6 16:04:15 UTC 2012

Hi,

I am trying to write an Nginx module that receive input and produce
output in streaming fashion over long lasting connection. Due to the
nature of the connection, I can't wait for all the buffers to be
reclaimed at the end of the request, as that would potentially cause
high memory consumption to the server. As a result, I am looking for
ways to determine if buffers sent through ngx_http_output_filter() are
safe to be reused.

The problem:
When ngx_http_output_filter() is called frequent enough with large
buffers, it is possible that NGX_AGAIN would be returned, indicating
that some buffers are retained somewhere in the output chain waiting
to be sent later. This creates problem for buffer reuse because there
is no way to know which buffers are being retained, and when the
retained buffers will become free after being sent. I tried doing
experiment of sending large amount of dynamic data through
ngx_http_output_filter() and then immediately modifying the buffer
data. What I found is that the data would become corrupted when
NGX_AGAIN is returned.

There are lack of documentation and resources about buffer reuse when
I searched through the Internet. I also tried reading the source code
but find most parts confusing even after days of studying. So I'd like
to clarify a few clues that I found to determine if my understanding
is correct. Following are the data structures and functions that I
believe are related:

#define ngx_free_chain(pool, cl)
Unfortunately this function's name does not reflect that it does
exactly. From my understanding it only recycles the ngx_chain_t object
by placing the pointer into pool->chain. Even the ngx_buf_t object
that cl->buf points to is completely ignored.

ngx_chain_t* ngx_alloc_chain_link(ngx_pool_t *pool)
This function only allocate the ngx_chain_t object and will try to
reuse the chains that were freed by ngx_free_chain(). Because the buf
pointer in the chain is ignored it is not safe to assume that the
ngx_buf_t object and the buffer data can be reused.

ngx_int_t ngx_output_chain(ngx_output_chain_ctx_t *ctx, ngx_chain_t *in)
It seems to me that most of the buffer copying magic happens in this
function. I tried as hard to follow but still could not fully
understand what the function does. As I understand this function would
copy the actual buffer data if the ngx_buf_t object has some specific
flags set as determined by ngx_output_chain_as_is(). I am not sure
whether I should set any flags to instruct ngx_output_chain() to copy
all buffer data so that I can safely reuse the buffers that I own.

typedef void* ngx_buf_tag_t
This mysterious tag seems to be the way for me to claim ownership to a
buffer by assigning it a unique pointer value. However I could find
almost no explanation on how to use this tag field properly. I'd like
to know if setting this tag would guarantee that the buffers I created
would never be shared ownership with other modules?

void ngx_chain_update_chains(ngx_pool_t *p, ngx_chain_t **free,
ngx_chain_t **busy, ngx_chain_t **out, ngx_buf_tag_t tag)
I find that this function seems to be performing what I want, and it
seems to be called in other modules that has similar buffer reuse
mechanism. However I am really confused about the purpose of this
function and what it does exactly. From the signature it seems to be
determining which buffers are safe to reuse, and then reclaim the free
buffers into the **free chain. However on close inspection I found
that all it does is to move all tagged buffers at **busy and **out to
free, while calling ngx_free_chain() on buffer chains that do not
share the same tag. I don't know if the buffers freed by this function
is guaranteed safe to be reused, and I don't know what happen with the
buffers that have different tags.

ngx_chain_t * ngx_chain_get_free_buf(ngx_pool_t *p, ngx_chain_t **free)
I find that this function will return the buffers freed by
ngx_chain_update_chains(). Most modules seem to do overwrite the data
on the obtained buffer without any issue. That makes me wonder if
ngx_chain_update_chains() really works.

ngx_chain_t * ngx_connection_s::send_chain(ngx_connection_t *c,
ngx_chain_t *in, off_t limit)
The function pointer at r->connection->send_chain would return the
buffer chain that it has not yet sent. I also found that the returned
chain is then stored in r->out waiting to be sent next time. So it
seems like I can determine if my buffers are safe for reuse by
checking if the buffer chains at r->out point to the same buffer data.
However I am not sure if solely based on this method is really safe,
especially if there are filter modules that retain buffers in their
own context.

ngx_int_t ngx_http_output_filter(ngx_http_request_t *r, ngx_chain_t *in)
After so many clues that I stated above, I just wish to know what is
really the right way to determine if buffers are safe for reuse after
this function, ngx_http_output_filter(), is called. Can I just set
buf->tag? Or should I check r->out? Or should I call
ngx_chain_get_free_buf()?

I am sorry if I have any misunderstanding about the internals or used
the wrong term in describing my findings. I would love to spend more
time to study the Nginx source code, but it has been weeks and I
really need to proceed with my module development. I'd appreciate if
anyone can clarify my misunderstandings and explain to me what exactly
happens when buffers are being sent.

Thanks!

References:
http://mailman.nginx.org/pipermail/nginx-devel/2011-September/001180.html
http://mailman.nginx.org/pipermail/nginx/2010-April/019814.html

Regards,

Soares