Chunked request body and HTTP header parser
Maxim Dounin
mdounin at mdounin.ru
Mon Nov 23 16:16:03 MSK 2009
Hello!
On Mon, Nov 23, 2009 at 12:30:47PM +0000, Valery Kholodkov wrote:
> Greetings!
>
> Long time ago I briefly mentioned that I'm working on chunked HTTP request body processing.
>
> While implementing I have faced with several problems.
>
> The main problem is that chunked HTTP request body does not
> always contain Content-Length header,
That's the whole point why chunked encoding exists. :)
> therefore it is impossible to determine how many bytes it is
> necessary to read in order to get complete body. In keepalive
> connection this is problematic, because the header of next
> pipelined request may immediately follow the body or trailer of
> current request. Assume, that we are trying to read body in
> preallocated buffers in most efficient way. The header of next
> request may end up in the request body buffer. This creates an
> undesired situation, that on one hand the chunked body filter
> must signal the end of the request body, on the other hand the
> remaining part of the buffer, which chunked body filter does not
> want to consume, must be returned to the HTTP header parser.
> I've found that the last thing -- returning the remaining part
> of the buffer to the HTTP header parser -- is not something what
> could be easily done, due to the complexity of HTTP header
> parser. Does someone have a clue how it could be implemented?
With chunked encoding you always know next chunk size as soon as
you received chunk header. So basically you may read chunk
header(s) char-by-char to do things safely (well, not really
char-by-char, it's safe to read at least 5 bytes initially, but it
doesn't really matter).
I don't think it's a good idea though, IMHO it's better to
implement some form of preread handling for pipelined requests.
> The second problem is that efficiency of request body reception
> is limited with connection's recv call, because recv_chain does
> not get read limit in bytes as argument. Therefore, even if
> Content-Length is given, it is impossible to read request body
> into multiple buffers, thus improving the memory consumption. If
> I grep for recv_chain in nginx's code, I see that it is used
> only in src/event/ngx_event_pipe.c. It doesn't seem that a lot
> of stuff will be broken if an argument will be added to
> recv_chain.
>
> Does anyone see any other problems if an argument will be added to recv_chain?
I'm not sure this is needed if prepead handling (see above) will
be implemented, but see no problems anyway.
> Now, you might ask me why I am writing this while someone has already implemented reception of chunked body. I've seen the implementation of chunked body parser from agentzh:
>
> http://github.com/agentzh/chunkin-nginx-module
>
> It is nice, but I think it neither addressed the pipelined requests problem, nor it can be used with in standard modules, like proxy, because it re-implements function ngx_http_read_client_request_body.
>
> Could someone comment on that?
I have some preliminary chunked pipe filter implementation for use
in proxy code, but I believe it's good idea to have something
generic like ngx_event_pipe_copy_input_filter(). And reuse it in
both client chunked and upstream chunked cases. Not sure if it's
possible though.
Maxim Dounin
More information about the nginx-devel
mailing list