Sharing data when download the same object from upstream

SplitIce mat999 at gmail.com
Fri Aug 30 08:42:04 UTC 2013


This is an interesting idea, while I don't see it being all that useful for
most applications there are some that could really benefit (large file
proxying first comes to mind). If it could be achieved without introducing
too much of a CPU overhead in keeping track of the requests & available
parts it would be quite interesting.

I would like to see an option to supply a minimum size to restrict this
feature too (either by after x bytes are passed add to map/rbtree whatever
or based off content-length).

Regards,
Mathew


On Fri, Aug 30, 2013 at 6:01 PM, Anatoli Marinov <a.marinov at ucdn.com> wrote:

> Hello,
>
>
> On Wed, Aug 28, 2013 at 7:56 PM, Alex Garzão <alex.garzao at azion.com>wrote:
>
>> Hello Anatoli,
>>
>> Thanks for your reply. I will appreciate (a lot) your help :-)
>>
>> I'm trying to fix the code with the following requirements in mind:
>>
>> 1) We were upstreams/downstreams with good (and bad) links; in
>> general, upstream speed is more than downstream speed but, in some
>> situations, the downstream speed is a lot more quickly than the
>> upstream speed;
>>
> I think this is asynchronous and if the upstream is faster than the
> downstream it save the data to cached file faster and the downstream gets
> the data from the file instead of the mem buffers.
>
>
>> 2) I'm trying to disassociate the upstream speed from the downstream
>> speed. The first request (request that already will connect in the
>> upstream) download data to temp file, but no longer sends data to
>> downstream. I disabled this because, in my understand, if the first
>> request has a slow downstream, all others downstreams will wait data
>> to be sent to this slow downstream.
>>
> I think this is not necessary.
>
>
>>
>> My first doubt is: Need I worry about downstream/upstream speed?
>>
>> No
>
>
>> Well, I will try to explain what I did in the code:
>>
>> 1) I created a rbtree (currrent_downloads) that keeps the current
>> downloads (one rbtree per upstream). Each node keeps the first request
>> (request that already will connect into upstream) and a list
>> (download_info_list) that will keep two fields: (a) request waiting
>> data from the temp file and (b) file offset already sent from the temp
>> file (last_offset);
>>
>>
> I have the same but in ordered array (simple implementation). Anyway the
> rbtree will do the same. But this structure should be in shared memory
> because all workers should know which files are currently in downloading
> from upstream state. The should exist in tmp directory.
>
>
>> 2) In ngx_http_upstream_init_request(), when the object isn't in the
>> cache, before connect into upstream, I check if the object is in
>> rbtree (current_downloads);
>>
>> 3) When the object isn't in current_downloads, I add a node that
>> contains the first request (equal to current request) and I add the
>> current request into the download_info_list. Beyond that, I create a
>> timer event (polling) that will check all requests in
>> download_info_list and verify if there are data in temp file that
>> already not sent to the downstream. I create one timer event per
>> object [1].
>>
>> 4) When the object is in current_downloads, I add the request into
>> download_info_list and finalize ngx_http_upstream_init_request() (I
>> just return without execute ngx_http_upstream_finalize_request());
>>
>> 5) I have disabled (in ngx_event_pipe) the code that sends data to
>> downstream (requirement 2);
>>
>> 6) In the polling event, I get the current temp file offset
>> (first_request->upstream->pipe->temp_file->offset) and I check in the
>> download_info_list if this is > than last_offset. If true, I send more
>> data to downstream with the ngx_http_upstream_cache_send_partial (code
>> bellow);
>>
>> 7) In the polling event, when pipe->upstream_done ||
>> pipe->upstream_eof || pipe->upstream_error, and all data were sent to
>> downstream, I execute ngx_http_upstream_finalize_request for all
>> requests;
>>
>> 8) I added a bit flag (first_download_request) in ngx_http_request_t
>> struct to avoid request to be finished before all requests were
>> completed. In ngx_http_upstream_finalize_request() I check this flag.
>> But, in really, I don't have sure if is necessary avoid this
>> situation...
>>
>>
>> Bellow you can see the ngx_http_upstream_cache_send_partial code:
>>
>>
>> /////////////
>> static ngx_int_t
>> ngx_http_upstream_cache_send_partial(ngx_http_request_t *r,
>> ngx_temp_file_t *file, off_t offset, off_t bytes, unsigned last_buf)
>> {
>>     ngx_buf_t         *b;
>>     ngx_chain_t        out;
>>     ngx_http_cache_t  *c;
>>
>>     c = r->cache;
>>
>>     /* we need to allocate all before the header would be sent */
>>
>>     b = ngx_pcalloc(r->pool, sizeof(ngx_buf_t));
>>     if (b == NULL) {
>>         return NGX_HTTP_INTERNAL_SERVER_ERROR;
>>     }
>>
>>     b->file = ngx_pcalloc(r->pool, sizeof(ngx_file_t));
>>     if (b->file == NULL) {
>>         return NGX_HTTP_INTERNAL_SERVER_ERROR;
>>     }
>>
>>     /* FIX: need to run ngx_http_send_header(r) once... */
>>
>>     b->file_pos = offset;
>>     b->file_last = bytes;
>>
>>     b->in_file = 1;
>>     b->last_buf = last_buf;
>>     b->last_in_chain = 1;
>>
>>     b->file->fd = file->file.fd;
>>     b->file->name = file->file.name;
>>     b->file->log = r->connection->log;
>>
>>     out.buf = b;
>>     out.next = NULL;
>>
>>     return ngx_http_output_filter(r, &out);
>> }
>> ////////////
>>
>> My second doubt is: Could I just fix ngx_event_pipe to send to all
>> requests (instead of to send to one request)? And, if true,
>> ngx_http_output_filter can be used to send a big chunk at first time
>> (300 MB or more) and little chunks after that?
>>
>>
> Use smaller chunks.
>
> Thanks in advance for your attention :-)
>>
>> [1] I know that "polling event" is a bad approach with NGINX, but I
>> don't know how to fix this. For example, the upstream download can be
>> very quickly, and is possible that I need send data to downstream in
>> little chunks. Upstream (in NGINX) is socket event based, but, when
>> download from upstream finished, which event can I expect?
>>
>> Regards.
>> --
>> Alex Garzão
>> Projetista de Software
>> Azion Technologies
>> alex.garzao (at) azion.com
>>
>> _______________________________________________
>> nginx-devel mailing list
>> nginx-devel at nginx.org
>> http://mailman.nginx.org/mailman/listinfo/nginx-devel
>>
>
> You are on a right way. Just keep digging. Do not forget to turn off this
> features when you have flv or mp4 seek, partial requests and
> content-ecoding different than identity because you will send broken files
> to the browsers.
>
> _______________________________________________
> nginx-devel mailing list
> nginx-devel at nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx-devel/attachments/20130830/50183fc6/attachment-0001.html>


More information about the nginx-devel mailing list