Chunked request body and HTTP header parser

Mon Nov 23 16:16:03 MSK 2009

Hello!

On Mon, Nov 23, 2009 at 12:30:47PM +0000, Valery Kholodkov wrote:

> Greetings!
> 
> Long time ago I briefly mentioned that I'm working on chunked HTTP request body processing.
> 
> While implementing I have faced with several problems.
> 
> The main problem is that chunked HTTP request body does not 
> always contain Content-Length header,

That's the whole point why chunked encoding exists.  :)

> therefore it is impossible to determine how many bytes it is 
> necessary to read in order to get complete body. In keepalive 
> connection this is problematic, because the header of next 
> pipelined request may immediately follow the body or trailer of 
> current request. Assume, that we are trying to read body in 
> preallocated buffers in most efficient way. The header of next 
> request may end up in the request body buffer. This creates an 
> undesired situation, that on one hand the chunked body filter 
> must signal the end of the request body, on the other hand the 
> remaining part of the buffer, which chunked body filter does not 
> want to consume, must be returned to the HTTP header parser. 
> I've found that the last thing -- returning the remaining part 
> of the buffer to the HTTP header parser -- is not something what 
> could be easily done, due to the complexity of HTTP header 
> parser. Does someone have a clue how it could be implemented?

With chunked encoding you always know next chunk size as soon as 
you received chunk header.  So basically you may read chunk 
header(s) char-by-char to do things safely (well, not really 
char-by-char, it's safe to read at least 5 bytes initially, but it 
doesn't really matter).

I don't think it's a good idea though, IMHO it's better to 
implement some form of preread handling for pipelined requests.

> The second problem is that efficiency of request body reception 
> is limited with connection's recv call, because recv_chain does 
> not get read limit in bytes as argument. Therefore, even if 
> Content-Length is given, it is impossible to read request body 
> into multiple buffers, thus improving the memory consumption. If 
> I grep for recv_chain in nginx's code, I see that it is used 
> only in src/event/ngx_event_pipe.c. It doesn't seem that a lot 
> of stuff will be broken if an argument will be added to 
> recv_chain. 
> 
> Does anyone see any other problems if an argument will be added to recv_chain?

I'm not sure this is needed if prepead handling (see above) will 
be implemented, but see no problems anyway.

> Now, you might ask me why I am writing this while someone has already implemented reception of chunked body. I've seen the implementation of chunked body parser from agentzh:
> 
> http://github.com/agentzh/chunkin-nginx-module
> 
> It is nice, but I think it neither addressed the pipelined requests problem, nor it can be used with in standard modules, like proxy, because it re-implements function ngx_http_read_client_request_body.
> 
> Could someone comment on that?

I have some preliminary chunked pipe filter implementation for use 
in proxy code, but I believe it's good idea to have something 
generic like ngx_event_pipe_copy_input_filter().  And reuse it in 
both client chunked and upstream chunked cases.  Not sure if it's 
possible though.

Maxim Dounin