Sharing data when download the same object from upstream

Mon Sep 2 06:01:51 UTC 2013

The patch in not in the milling list. We just spoke about the same problem
before in the list with other developers. Unfortunately I cannot share the
patch because it has been made for commercial project. However I am going
to ask for permition to share it.

On Fri, Aug 30, 2013 at 12:04 PM, SplitIce <mat999 at gmail.com> wrote:

> Is the patch on this mailing list (forgive me I cant see it)?
>
> Ill happily test it for you, although for me to get any personal benefit
> there would need to be a size restriction since 99.9% of requests are just
> small HTML documents and would not benifit. Also the standard caching
> (headers that result in a cache miss e.g cookies, cache-control) would have
> to be correct.
>
> At the very least Ill read over it and see if I spot anything / have
> recommendations.
>
> Regards,
> Mathew
>
>
> On Fri, Aug 30, 2013 at 6:25 PM, Anatoli Marinov <a.marinov at ucdn.com>wrote:
>
>> I discussed the idea years ago here in the mailing list but nobody from
>> the main developers liked it. However I developed a patch and we have this
>> in production more than 1 year and it works fine.
>>
>> Just think for the following case:
>> You have a new file which is 1 GB and it is located far from the cache.
>> Even so you can download it with 5 MBps through cache upstream so you need
>> 200 seconds to get it. This file is a video file and because it is a new is
>> placed on the first page. For first 30 seconds your caching server may
>> receive 1000 requests (or even more)  for this file and you cannot block
>> all new requests for 170 seconds ?!?! to wait for file to be downloaded.
>> Also all requests will be send to the origin and your proxy will generate 1
>> TB traffic instead of 1 GB.
>>
>> It will be amazing if this feature will be implemented as a part of the
>> common caching mechanism.
>>
>>
>>
>> On Fri, Aug 30, 2013 at 11:42 AM, SplitIce <mat999 at gmail.com> wrote:
>>
>>> This is an interesting idea, while I don't see it being all that useful
>>> for most applications there are some that could really benefit (large file
>>> proxying first comes to mind). If it could be achieved without introducing
>>> too much of a CPU overhead in keeping track of the requests & available
>>> parts it would be quite interesting.
>>>
>>> I would like to see an option to supply a minimum size to restrict this
>>> feature too (either by after x bytes are passed add to map/rbtree whatever
>>> or based off content-length).
>>>
>>> Regards,
>>> Mathew
>>>
>>>
>>> On Fri, Aug 30, 2013 at 6:01 PM, Anatoli Marinov <a.marinov at ucdn.com>wrote:
>>>
>>>> Hello,
>>>>
>>>>
>>>> On Wed, Aug 28, 2013 at 7:56 PM, Alex Garzão <alex.garzao at azion.com>wrote:
>>>>
>>>>> Hello Anatoli,
>>>>>
>>>>> Thanks for your reply. I will appreciate (a lot) your help :-)
>>>>>
>>>>> I'm trying to fix the code with the following requirements in mind:
>>>>>
>>>>> 1) We were upstreams/downstreams with good (and bad) links; in
>>>>> general, upstream speed is more than downstream speed but, in some
>>>>> situations, the downstream speed is a lot more quickly than the
>>>>> upstream speed;
>>>>>
>>>> I think this is asynchronous and if the upstream is faster than the
>>>> downstream it save the data to cached file faster and the downstream gets
>>>> the data from the file instead of the mem buffers.
>>>>
>>>>
>>>>> 2) I'm trying to disassociate the upstream speed from the downstream
>>>>> speed. The first request (request that already will connect in the
>>>>> upstream) download data to temp file, but no longer sends data to
>>>>> downstream. I disabled this because, in my understand, if the first
>>>>> request has a slow downstream, all others downstreams will wait data
>>>>> to be sent to this slow downstream.
>>>>>
>>>> I think this is not necessary.
>>>>
>>>>
>>>>>
>>>>> My first doubt is: Need I worry about downstream/upstream speed?
>>>>>
>>>>> No
>>>>
>>>>
>>>>> Well, I will try to explain what I did in the code:
>>>>>
>>>>> 1) I created a rbtree (currrent_downloads) that keeps the current
>>>>> downloads (one rbtree per upstream). Each node keeps the first request
>>>>> (request that already will connect into upstream) and a list
>>>>> (download_info_list) that will keep two fields: (a) request waiting
>>>>> data from the temp file and (b) file offset already sent from the temp
>>>>> file (last_offset);
>>>>>
>>>>>
>>>> I have the same but in ordered array (simple implementation). Anyway
>>>> the rbtree will do the same. But this structure should be in shared memory
>>>> because all workers should know which files are currently in downloading
>>>> from upstream state. The should exist in tmp directory.
>>>>
>>>>
>>>>> 2) In ngx_http_upstream_init_request(), when the object isn't in the
>>>>> cache, before connect into upstream, I check if the object is in
>>>>> rbtree (current_downloads);
>>>>>
>>>>> 3) When the object isn't in current_downloads, I add a node that
>>>>> contains the first request (equal to current request) and I add the
>>>>> current request into the download_info_list. Beyond that, I create a
>>>>> timer event (polling) that will check all requests in
>>>>> download_info_list and verify if there are data in temp file that
>>>>> already not sent to the downstream. I create one timer event per
>>>>> object [1].
>>>>>
>>>>> 4) When the object is in current_downloads, I add the request into
>>>>> download_info_list and finalize ngx_http_upstream_init_request() (I
>>>>> just return without execute ngx_http_upstream_finalize_request());
>>>>>
>>>>> 5) I have disabled (in ngx_event_pipe) the code that sends data to
>>>>> downstream (requirement 2);
>>>>>
>>>>> 6) In the polling event, I get the current temp file offset
>>>>> (first_request->upstream->pipe->temp_file->offset) and I check in the
>>>>> download_info_list if this is > than last_offset. If true, I send more
>>>>> data to downstream with the ngx_http_upstream_cache_send_partial (code
>>>>> bellow);
>>>>>
>>>>> 7) In the polling event, when pipe->upstream_done ||
>>>>> pipe->upstream_eof || pipe->upstream_error, and all data were sent to
>>>>> downstream, I execute ngx_http_upstream_finalize_request for all
>>>>> requests;
>>>>>
>>>>> 8) I added a bit flag (first_download_request) in ngx_http_request_t
>>>>> struct to avoid request to be finished before all requests were
>>>>> completed. In ngx_http_upstream_finalize_request() I check this flag.
>>>>> But, in really, I don't have sure if is necessary avoid this
>>>>> situation...
>>>>>
>>>>>
>>>>> Bellow you can see the ngx_http_upstream_cache_send_partial code:
>>>>>
>>>>>
>>>>> /////////////
>>>>> static ngx_int_t
>>>>> ngx_http_upstream_cache_send_partial(ngx_http_request_t *r,
>>>>> ngx_temp_file_t *file, off_t offset, off_t bytes, unsigned last_buf)
>>>>> {
>>>>>     ngx_buf_t         *b;
>>>>>     ngx_chain_t        out;
>>>>>     ngx_http_cache_t  *c;
>>>>>
>>>>>     c = r->cache;
>>>>>
>>>>>     /* we need to allocate all before the header would be sent */
>>>>>
>>>>>     b = ngx_pcalloc(r->pool, sizeof(ngx_buf_t));
>>>>>     if (b == NULL) {
>>>>>         return NGX_HTTP_INTERNAL_SERVER_ERROR;
>>>>>     }
>>>>>
>>>>>     b->file = ngx_pcalloc(r->pool, sizeof(ngx_file_t));
>>>>>     if (b->file == NULL) {
>>>>>         return NGX_HTTP_INTERNAL_SERVER_ERROR;
>>>>>     }
>>>>>
>>>>>     /* FIX: need to run ngx_http_send_header(r) once... */
>>>>>
>>>>>     b->file_pos = offset;
>>>>>     b->file_last = bytes;
>>>>>
>>>>>     b->in_file = 1;
>>>>>     b->last_buf = last_buf;
>>>>>     b->last_in_chain = 1;
>>>>>
>>>>>     b->file->fd = file->file.fd;
>>>>>     b->file->name = file->file.name;
>>>>>     b->file->log = r->connection->log;
>>>>>
>>>>>     out.buf = b;
>>>>>     out.next = NULL;
>>>>>
>>>>>     return ngx_http_output_filter(r, &out);
>>>>> }
>>>>> ////////////
>>>>>
>>>>> My second doubt is: Could I just fix ngx_event_pipe to send to all
>>>>> requests (instead of to send to one request)? And, if true,
>>>>> ngx_http_output_filter can be used to send a big chunk at first time
>>>>> (300 MB or more) and little chunks after that?
>>>>>
>>>>>
>>>> Use smaller chunks.
>>>>
>>>> Thanks in advance for your attention :-)
>>>>>
>>>>> [1] I know that "polling event" is a bad approach with NGINX, but I
>>>>> don't know how to fix this. For example, the upstream download can be
>>>>> very quickly, and is possible that I need send data to downstream in
>>>>> little chunks. Upstream (in NGINX) is socket event based, but, when
>>>>> download from upstream finished, which event can I expect?
>>>>>
>>>>> Regards.
>>>>> --
>>>>> Alex Garzão
>>>>> Projetista de Software
>>>>> Azion Technologies
>>>>> alex.garzao (at) azion.com
>>>>>
>>>>> _______________________________________________
>>>>> nginx-devel mailing list
>>>>> nginx-devel at nginx.org
>>>>> http://mailman.nginx.org/mailman/listinfo/nginx-devel
>>>>>
>>>>
>>>> You are on a right way. Just keep digging. Do not forget to turn off
>>>> this features when you have flv or mp4 seek, partial requests and
>>>> content-ecoding different than identity because you will send broken files
>>>> to the browsers.
>>>>
>>>> _______________________________________________
>>>> nginx-devel mailing list
>>>> nginx-devel at nginx.org
>>>> http://mailman.nginx.org/mailman/listinfo/nginx-devel
>>>>
>>>
>>>
>>> _______________________________________________
>>> nginx-devel mailing list
>>> nginx-devel at nginx.org
>>> http://mailman.nginx.org/mailman/listinfo/nginx-devel
>>>
>>
>>
>> _______________________________________________
>> nginx-devel mailing list
>> nginx-devel at nginx.org
>> http://mailman.nginx.org/mailman/listinfo/nginx-devel
>>
>
>
> _______________________________________________
> nginx-devel mailing list
> nginx-devel at nginx.org
> http://mailman.nginx.org/mailman/listinfo/nginx-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.nginx.org/pipermail/nginx-devel/attachments/20130902/bfe95152/attachment-0001.html>