Buffer Reuse in Nginx

Tue Mar 6 17:26:44 UTC 2012

Hello!

On Tue, Mar 06, 2012 at 05:04:15PM +0100, Soares Chen wrote:

> Hi,
> 
> I am trying to write an Nginx module that receive input and produce
> output in streaming fashion over long lasting connection. Due to the
> nature of the connection, I can't wait for all the buffers to be
> reclaimed at the end of the request, as that would potentially cause
> high memory consumption to the server. As a result, I am looking for
> ways to determine if buffers sent through ngx_http_output_filter() are
> safe to be reused.
> 
> The problem:
> When ngx_http_output_filter() is called frequent enough with large
> buffers, it is possible that NGX_AGAIN would be returned, indicating
> that some buffers are retained somewhere in the output chain waiting
> to be sent later. This creates problem for buffer reuse because there
> is no way to know which buffers are being retained, and when the
> retained buffers will become free after being sent. I tried doing
> experiment of sending large amount of dynamic data through
> ngx_http_output_filter() and then immediately modifying the buffer
> data. What I found is that the data would become corrupted when
> NGX_AGAIN is returned.
> 
> 
> There are lack of documentation and resources about buffer reuse when
> I searched through the Internet. I also tried reading the source code
> but find most parts confusing even after days of studying. So I'd like
> to clarify a few clues that I found to determine if my understanding
> is correct. Following are the data structures and functions that I
> believe are related:
> 
> 
> #define ngx_free_chain(pool, cl)
> Unfortunately this function's name does not reflect that it does
> exactly. From my understanding it only recycles the ngx_chain_t object
> by placing the pointer into pool->chain. Even the ngx_buf_t object
> that cl->buf points to is completely ignored.
> 
> 
> ngx_chain_t* ngx_alloc_chain_link(ngx_pool_t *pool)
> This function only allocate the ngx_chain_t object and will try to
> reuse the chains that were freed by ngx_free_chain(). Because the buf
> pointer in the chain is ignored it is not safe to assume that the
> ngx_buf_t object and the buffer data can be reused.

Correct, these two functions deal with chain links, and they 
completely ignore any possible content of the structures.  They 
are basically equivalent to

    ngx_palloc(pool, sizeof(ngx_chain_t));
    ngx_pfree(pool, cl);

> ngx_int_t ngx_output_chain(ngx_output_chain_ctx_t *ctx, ngx_chain_t *in)
> It seems to me that most of the buffer copying magic happens in this
> function. I tried as hard to follow but still could not fully
> understand what the function does. As I understand this function would
> copy the actual buffer data if the ngx_buf_t object has some specific
> flags set as determined by ngx_output_chain_as_is(). I am not sure
> whether I should set any flags to instruct ngx_output_chain() to copy
> all buffer data so that I can safely reuse the buffers that I own.

You shouldn't instruct it to copy anything.

Instead, you should reuse your own buffers as long as they are 
freed, via ngx_chain_update_chains() and friends.  See below.

> typedef void* ngx_buf_tag_t
> This mysterious tag seems to be the way for me to claim ownership to a
> buffer by assigning it a unique pointer value. However I could find
> almost no explanation on how to use this tag field properly. I'd like
> to know if setting this tag would guarantee that the buffers I created
> would never be shared ownership with other modules?

It is used by ngx_chain_update_chains() to match buffers allocated 
by your module.

> void ngx_chain_update_chains(ngx_pool_t *p, ngx_chain_t **free,
> ngx_chain_t **busy, ngx_chain_t **out, ngx_buf_tag_t tag)
> I find that this function seems to be performing what I want, and it
> seems to be called in other modules that has similar buffer reuse
> mechanism. However I am really confused about the purpose of this
> function and what it does exactly. From the signature it seems to be
> determining which buffers are safe to reuse, and then reclaim the free
> buffers into the **free chain. However on close inspection I found
> that all it does is to move all tagged buffers at **busy and **out to
> free, while calling ngx_free_chain() on buffer chains that do not
> share the same tag. I don't know if the buffers freed by this function
> is guaranteed safe to be reused, and I don't know what happen with the
> buffers that have different tags.

Buffers only moved to **free if they are indeed free, i.e. when 
ngx_buf_size(cl->buf) == 0.  Buffers from **free will be then 
reused either with ngx_chain_get_free_buf() or with your own code.

> ngx_chain_t * ngx_chain_get_free_buf(ngx_pool_t *p, ngx_chain_t **free)
> I find that this function will return the buffers freed by
> ngx_chain_update_chains(). Most modules seem to do overwrite the data
> on the obtained buffer without any issue. That makes me wonder if
> ngx_chain_update_chains() really works.

See above.

> ngx_chain_t * ngx_connection_s::send_chain(ngx_connection_t *c,
> ngx_chain_t *in, off_t limit)
> The function pointer at r->connection->send_chain would return the
> buffer chain that it has not yet sent. I also found that the returned
> chain is then stored in r->out waiting to be sent next time. So it
> seems like I can determine if my buffers are safe for reuse by
> checking if the buffer chains at r->out point to the same buffer data.
> However I am not sure if solely based on this method is really safe,
> especially if there are filter modules that retain buffers in their
> own context.

No, this isn't correct aproach.  Don't touch r->out, use 
ngx_chain_update_chains() instead, it will do needed work for you.

> ngx_int_t ngx_http_output_filter(ngx_http_request_t *r, ngx_chain_t *in)
> After so many clues that I stated above, I just wish to know what is
> really the right way to determine if buffers are safe for reuse after
> this function, ngx_http_output_filter(), is called. Can I just set
> buf->tag? Or should I check r->out? Or should I call
> ngx_chain_get_free_buf()?

You should really use ngx_chain_update_chains().  The basic 
aproach is to do something like this (partially stolen from 
chunked filter):

    cl = ngx_chain_get_free_buf(r->pool, &ctx->free);
    if (cl == NULL) {
        return NGX_ERROR;
    }

    if (cl->buf->start == NULL) {
        /*
         * allocate memory for a buffer, if you really need one 
         * with associated memory region
         */

        ...
    }

    cl->buf->tag = (ngx_buf_tag_t) &ngx_http_chunked_filter_module;

    ...

    rc = ngx_http_output_filter(r, out);

    ngx_chain_update_chains(r->pool, &ctx->free, &ctx->busy, &out,
                            (ngx_buf_tag_t) &ngx_http_chunked_filter_module);

    ...

You may also want to add some extra processing to ensure that no 
more than specified number of buffers will be allocated, handle 
case when you can't allocate more buffers and so on.  Note that 
chunked filter doesn't do this as it doesn't really care (it just 
want to reuse it's own buffers, but doesn't cap number of them), 
you may want to take a look at gzip filter and/or upstream module 
and event pipe code for more complex examples.

Maxim Dounin