HEAD request to GCS caching body

Señor J Onion senor.j.onion at gmail.com
Thu Mar 4 14:03:40 UTC 2021

I use nginx as a forward proxy, with content caching.

My app first performs a HEAD request to a Google Cloud Storage object. Then it may perform a GET request to the same object.

The HEAD request (which comes first) causes a cache MISS. The content body length returned to the client is 0 (which is obviously correct).

However, I think that the actual object is still included in the body from the upstream response. The reason I believe why the object gets added to the HEAD response from the upstream service (GCS) is for two reasons:

a) When I subsequently do the GET request, I don't get a cache MISS (even though this is my first GET request to that object), but a cache REVALIDATED. The response from the upstream service is just a 304 with no body saying the cached object is still valid ($upstream_header_time and $upstream_response_time are identical == 0.421, which would then be correct if the cached object is still valid).
So - this seems like the initial HEAD request cached the response also as a GET request with the body of the object that seemed to have been in the HEAD request

b) Also, when I do the initial HEAD request, I can see that the $upstream_header_time==0.832, and the $upstream_response_time==2.459 ... If it's a HEAD request there really shouldn't be a body, so I would expect both $upstream_header_time and $upstream_response_time to be identical. However the 1.5sec time difference shows me that there is something in the body (even though when the request returns to the client it all seems correct again in terms of that the actual response.body.length is indeed 0.)

So - the way this is working is messing with my app and HTTP analytics. I believe this to be behaving incorrectly.

I don't know where the "error" lies. If it is a Google Cloud Storage bug that it passes along the object in the body of the HEAD request, or whether the issue lies with nginx, or with my configuration, or whether it is with the content caching part of nginx?
Or perhaps it is behaving exactly as it should, and there is something about the HEAD/GET requests in combination with caching that I am not understanding.

Any help to shed light on this strange behaviour would be greatly appreciated.

My server block config is as follows:

      proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=image_cache:10m inactive=60d use_temp_path=off;

      server {
         listen 3128;

         location / {
            proxy_cache image_cache;

            proxy_cache_revalidate on;

            proxy_cache_lock on;
            proxy_cache_lock_timeout 5s;

            proxy_ignore_headers Cache-Control;
            proxy_cache_valid 200 60d;

            add_header X-Cache-Status $upstream_cache_status;

            resolver ipv6=off;
            proxy_pass http://$http_host$uri$is_args$args;

