Sharing data when download the same object from upstream

SplitIce mat999 at gmail.com
Fri Aug 30 09:04:37 UTC 2013


Is the patch on this mailing list (forgive me I cant see it)?

Ill happily test it for you, although for me to get any personal benefit
there would need to be a size restriction since 99.9% of requests are just
small HTML documents and would not benifit. Also the standard caching
(headers that result in a cache miss e.g cookies, cache-control) would have
to be correct.

At the very least Ill read over it and see if I spot anything / have
recommendations.

Regards,
Mathew


On Fri, Aug 30, 2013 at 6:25 PM, Anatoli Marinov <a.marinov at ucdn.com> wrote:

> I discussed the idea years ago here in the mailing list but nobody from
> the main developers liked it. However I developed a patch and we have this
> in production more than 1 year and it works fine.
>
> Just think for the following case:
> You have a new file which is 1 GB and it is located far from the cache.
> Even so you can download it with 5 MBps through cache upstream so you need
> 200 seconds to get it. This file is a video file and because it is a new is
> placed on the first page. For first 30 seconds your caching server may
> receive 1000 requests (or even more)  for this file and you cannot block
> all new requests for 170 seconds ?!?! to wait for file to be downloaded.
> Also all requests will be send to the origin and your proxy will generate 1
> TB traffic instead of 1 GB.
>
> It will be amazing if this feature will be implemented as a part of the
> common caching mechanism.
>
>
>
> On Fri, Aug 30, 2013 at 11:42 AM, SplitIce <mat999 at gmail.com> wrote:
>
>> This is an interesting idea, while I don't see it being all that useful
>> for most applications there are some that could really benefit (large file
>> proxying first comes to mind). If it could be achieved without introducing
>> too much of a CPU overhead in keeping track of the requests & available
>> parts it would be quite interesting.
>>
>> I would like to see an option to supply a minimum size to restrict this
>> feature too (either by after x bytes are passed add to map/rbtree whatever
>> or based off content-length).
>>
>> Regards,
>> Mathew
>>
>>
>> On Fri, Aug 30, 2013 at 6:01 PM, Anatoli Marinov <a.marinov at ucdn.com>wrote:
>>
>>> Hello,
>>>
>>>
>>> On Wed, Aug 28, 2013 at 7:56 PM, Alex Garzão <alex.garzao at azion.com>wrote:
>>>
>>>> Hello Anatoli,
>>>>
>>>> Thanks for your reply. I will appreciate (a lot) your help :-)
>>>>
>>>> I'm trying to fix the code with the following requirements in mind:
>>>>
>>>> 1) We were upstreams/downstreams with good (and bad) links; in
>>>> general, upstream speed is more than downstream speed but, in some
>>>> situations, the downstream speed is a lot more quickly than the
>>>> upstream speed;
>>>>
>>> I think this is asynchronous and if the upstream is faster than the
>>> downstream it save the data to cached file faster and the downstream gets
>>> the data from the file instead of the mem buffers.
>>>
>>>
>>>> 2) I'm trying to disassociate the upstream speed from the downstream
>>>> speed. The first request (request that already will connect in the
>>>> upstream) download data to temp file, but no longer sends data to
>>>> downstream. I disabled this because, in my understand, if the first
>>>> request has a slow downstream, all others downstreams will wait data
>>>> to be sent to this slow downstream.
>>>>
>>> I think this is not necessary.
>>>
>>>
>>>>
>>>> My first doubt is: Need I worry about downstream/upstream speed?
>>>>
>>>> No
>>>
>>>
>>>> Well, I will try to explain what I did in the code:
>>>>
>>>> 1) I created a rbtree (currrent_downloads) that keeps the current
>>>> downloads (one rbtree per upstream). Each node keeps the first request
>>>> (request that already will connect into upstream) and a list
>>>> (download_info_list) that will keep two fields: (a) request waiting
>>>> data from the temp file and (b) file offset already sent from the temp
>>>> file (last_offset);
>>>>
>>>>
>>> I have the same but in ordered array (simple implementation). Anyway the
>>> rbtree will do the same. But this structure should be in shared memory
>>> because all workers should know which files are currently in downloading
>>> from upstream state. The should exist in tmp directory.
>>>
>>>
>>>> 2) In ngx_http_upstream_init_request(), when the object isn't in the
>>>> cache, before connect into upstream, I check if the object is in
>>>> rbtree (current_downloads);
>>>>
>>>> 3) When the object isn't in current_downloads, I add a node that
>>>> contains the first request (equal to current request) and I add the
>>>> current request into the download_info_list. Beyond that, I create a
>>>> timer event (polling) that will check all requests in
>>>> download_info_list and verify if there are data in temp file that
>>>> already not sent to the downstream. I create one timer event per
>>>> object [1].
>>>>
>>>> 4) When the object is in current_downloads, I add the request into
>>>> download_info_list and finalize ngx_http_upstream_init_request() (I
>>>> just return without execute ngx_http_upstream_finalize_request());
>>>>
>>>> 5) I have disabled (in ngx_event_pipe) the code that sends data to
>>>> downstream (requirement 2);
>>>>
>>>> 6) In the polling event, I get the current temp file offset
>>>> (first_request->upstream->pipe->temp_file->offset) and I check in the
>>>> download_info_list if this is > than last_offset. If true, I send more
>>>> data to downstream with the ngx_http_upstream_cache_send_partial (code
>>>> bellow);
>>>>
>>>> 7) In the polling event, when pipe->upstream_done ||
>>>> pipe->upstream_eof || pipe->upstream_error, and all data were sent to
>>>> downstream, I execute ngx_http_upstream_finalize_request for all
>>>> requests;
>>>>
>>>> 8) I added a bit flag (first_download_request) in ngx_http_request_t
>>>> struct to avoid request to be finished before all requests were
>>>> completed. In ngx_http_upstream_finalize_request() I check this flag.
>>>> But, in really, I don't have sure if is necessary avoid this
>>>> situation...
>>>>
>>>>
>>>> Bellow you can see the ngx_http_upstream_cache_send_partial code:
>>>>
>>>>
>>>> /////////////
>>>> static ngx_int_t
>>>> ngx_http_upstream_cache_send_partial(ngx_http_request_t *r,
>>>> ngx_temp_file_t *file, off_t offset, off_t bytes, unsigned last_buf)
>>>> {
>>>>     ngx_buf_t         *b;
>>>>     ngx_chain_t        out;
>>>>     ngx_http_cache_t  *c;
>>>>
>>>>     c = r->cache;
>>>>
>>>>     /* we need to allocate all before the header would be sent */
>>>>
>>>>     b = ngx_pcalloc(r->pool, sizeof(ngx_buf_t));
>>>>     if (b == NULL) {
>>>>         return NGX_HTTP_INTERNAL_SERVER_ERROR;
>>>>     }
>>>>
>>>>     b->file = ngx_pcalloc(r->pool, sizeof(ngx_file_t));
>>>>     if (b->file == NULL) {
>>>>         return NGX_HTTP_INTERNAL_SERVER_ERROR;
>>>>     }
>>>>
>>>>     /* FIX: need to run ngx_http_send_header(r) once... */
>>>>
>>>>     b->file_pos = offset;
>>>>     b->file_last = bytes;
>>>>
>>>>     b->in_file = 1;
>>>>     b->last_buf = last_buf;
>>>>     b->last_in_chain = 1;
>>>>
>>>>     b->file->fd = file->file.fd;
>>>>     b->file->name = file->file.name;
>>>>     b->file->log = r->connection->log;
>>>>
>>>>     out.buf = b;
>>>>     out.next = NULL;
>>>>
>>>>     return ngx_http_output_filter(r, &out);
>>>> }
>>>> ////////////
>>>>
>>>> My second doubt is: Could I just fix ngx_event_pipe to send to all
>>>> requests (instead of to send to one request)? And, if true,
>>>> ngx_http_output_filter can be used to send a big chunk at first time
>>>> (300 MB or more) and little chunks after that?
>>>>
>>>>
>>> Use smaller chunks.
>>>
>>> Thanks in advance for your attention :-)
>>>>
>>>> [1] I know that "polling event" is a bad approach with NGINX, but I
>>>> don't know how to fix this. For example, the upstream download can be
>>>> very quickly, and is possible that I need send data to downstream in
>>>> little chunks. Upstream (in NGINX) is socket event based, but, when
>>>> download from upstream finished, which event can I expect?
>>>>
>>>> Regards.
>>>> --
>>>> Alex Garzão
>>>> Projetista de Software
>>>> Azion Technologies
>>>> alex.garzao (at) azion.com
>>>>
>>>> _______________________________________________
>>>> nginx-devel mailing list
>>>> nginx-devel at nginx.org
>>>> http://mailman.nginx.org/mailman/listinfo/nginx-devel
>>>>
>>>
>>> You are on a right way. Just keep digging. Do not forget to turn off
>>> this features when you have flv or mp4 seek, partial requests and
>>> content-ecoding different than identity because you will send broken files
>>> to the browsers.
>>>
>>> _______________________________________________
>>> nginx-devel mailing list
>>> nginx-devel at nginx.org
>>> http://mailman.nginx.org/mailman/listinfo/nginx-devel
>>>
>>
>>
>> _______________________________________________
>> nginx-devel mailing list
>> nginx-devel at nginx.org
>> http://mailman.nginx.org/mailman/listinfo/nginx-devel
>>
>
>
> _______________________________________________
> nginx-devel mailing list
> nginx-devel at nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx-devel/attachments/20130830/5e22aeb1/attachment-0001.html>


More information about the nginx-devel mailing list