Anatoli Marinov toli at
Fri Jan 13 12:44:37 UTC 2012


I am going to try to produce a solution for the issue I did post 
yesterday. It was about proxy_cache and 1000 requests for non-cached big 
object. There is suboptimal behavior and the object will be downloaded 
many times before been cached.
So I need a direction. At the moment I am digging in the source.....
Most of the the code which have a deal with caching is located in 
ngx_http_file_cache. If the object is not found in the cache it will be 
downloaded from upstream and the logic is implemented in ngx_event_pipe 
As I understand it ngx_event_pipe provides functionality for 
asynchronous downloading from upstream and sending content to client. As 
a buffer there is used a file in tmp directory.
When the file is fully downloaded from upstream then it is moved in 
cache directory. It is interesting when the file is in the tmp directory.
There isn't any shared structures that keep a list of current 
simultaneous requests in upstream nor event_pipe.
We may need a shared structure with current upstreams that write in temp 
file. For a new request if the object is not cached first we should 
check the list with current temp files. This list should be placed in 
shared memory because many workers have to use it.
It the object is not cached but part of it exists in temp file it should 
be streamed from it (without event_pipe functionality?).

So I am just sharing some ideas.

Will be a good idea if the files in tmp directory have a hash for name 
like they have in cache directory? In this case we may don't need shared 
list with upstreams.....
Do we need shared list with current upstreams which have pending 
download to tmp dir?
Do we need shared list with event_pipe instances?

All ideas are welcome!!!

Anatoli Marinov

On 01/12/2012 02:50 PM, Anatoli Marinov wrote:

> Hello,
> See my answer below.
> On 01/12/2012 02:34 PM, Maxim Dounin wrote:
>> Hello!
>> On Thu, Jan 12, 2012 at 11:49:23AM +0200, Anatoli Marinov wrote:
>>> I know this configuration variable. It has been added by Maxim last
>>> mouth in unstable (as i remembered but I am not absolutely sure). It
>>> seem to be a workaround and will not solve the problem. I think it
>>> is unusable.
>>> If we use it for the same case:
>>> In first 1 second A receives 1000 requests. Only 1 request will be
>>> send to B, for first request that A receives. The others 999 will
>>> wait for example 5 seconds. The link btw A and B is 1 MB per second
>>> and for 5 seconds A may receive 5 MB of data, so after 5 seconds 999
>>> requests will be sent to B.
>>> Is it right?
>> Yes.  The remaining questions are: are you serving "many big files –
>> 1GB – 2GB" over 1MB/s link?  And 1000 simulteneous requests to the
>> same file are likely situation in your workload?  If yes, you may
>> reconsider your network configuration.
> The bandwidth between A and B is not important. If the link is 100 
> Mbps or even 1Gbps the issue will be hit again! Actually our links are 
> bigger. I just wanted to illustrate the picture better.
>> You may also try to tune proxy_cache_lock_timeout (default is set
>> low enough to ensure minimal QoS impact), but it isn't likely to
>> help much in the particular situation described.
> Yes. There is no value that may help me.
>> Ideally (for big files use case), we should be able to stream the
>> same response to all clients requesting the file (while
>> downloading it from upstream), but this isn't likely to happen
>> soon.
> Yes! The only right solution could be streaming from temporary file.
>> In relatively near plans is to improve cache lock mechanism to
>> make it possible to switch off cache (and thus save some disk
>> resources) in case of lock timeout.
>> Maxim Dounin
>>> On 01/12/2012 11:33 AM, Andrew Alexeev wrote:
>>>> Check this one, pls :)
>>>> On Jan 12, 2012, at 1:32 PM, Anatoli Marinov wrote:
>>>>> Hello Colleagues,
>>>>> I found a performance issue with proxy_cache module.
>>>>> For example I have installed 2 servers with nginx-1.0.10. First
>>>>> one(A) works as a reverse proxy and the second one(B) is a
>>>>> storage with many big files – 1GB – 2GB.
>>>>> The link between A and B for example may serve 1 MBps.
>>>>> There is a new object on B and it is not yet cached on A.
>>>>> Let we assume this is a hot new object and A receives 1000
>>>>> requests for 3 seconds for it. Because the object is not cached
>>>>> the requests will pass through upstream to B and incoming 1000
>>>>> streams will be saved on A in tmp directory as a separate files.
>>>>> After every request has completed the files from tmp directory
>>>>> will be moved to cache directory. 1000 equal operations for one
>>>>> and the same object. In addition every object will be cached
>>>>> slow because there are 999 other streams.
>>>>> This 1 GB object will be downloaded 1000 times before it may be
>>>>> cached and this is not optimal at all.
>>>>> Am I missing something? It may be my configuration issue?
>>>>> Is there a solution for that?
>>>>> Cheers
>>>>> Anatoli Marinov
>>>>> _______________________________________________
>>>>> nginx-devel mailing list
>>>>> nginx-devel at<mailto:nginx-devel at>
>>>> _______________________________________________
>>>> nginx-devel mailing list
>>>> nginx-devel at
>>> _______________________________________________
>>> nginx-devel mailing list
>>> nginx-devel at
>> _______________________________________________
>> nginx-devel mailing list
>> nginx-devel at
> _______________________________________________
> nginx-devel mailing list
> nginx-devel at

More information about the nginx-devel mailing list