I was wondering what could be a good way to spawn off processes to modify cached file, to perform operations that would be too long to perform as inline filters. The practical example I have is running the yui-compressor on JS and CSS files.
I was thinking that this could be done directly on the cached files themselves, as a process completely decoupled from nginx
Keep in mind that cached files contain metadata, so you cannot just push them through yui-compressor.
Other than that, using dedicated task queue seems a lot more appropriate than doing this inside nginx.
but this doesn't seem ideal, and cache->sh->size would probably hold an incorrect value for the total cache size.
Personally, I wouldn't worry too much about cache->sh->size, but if you really do then after you're done with compressing .js/.css files, you could purge uncompressed files from the cache using ngx_cache_purge  and replace your original file with compressed one on-disk (this way nginx will pick up compressed version from the disk on the next access).
Keep in mind that in this scenario there is (very little) chance for race condition, so ideally you should modify ngx_cache_purge to mark content as "being updated" instead of "deleted" before removing it from the disk and mark it as "updated" after you've replaced it with compressed version.
Would there be a way to build that directly into nginx?
You could fork(), process cached file in the child process and update cache accordingly. This is far from perfect, but for a small number of tasks this should be reasonable enough. There are some caveats while forking with "open cache" though, check ngx_slowfs_cache  for details.
On the side note, you guys (CloudFlare) are talking to "remote backends" and nginx it isn't really suited for that (at least not yet, without HTTP/1.1 and keep-alive support)... You might get a lot better results using Apache Traffic Server, especially considering your aggressive preloading ;)
Best regards, Piotr Sikora < firstname.lastname@example.org >