Add support for buffering in scripted logs

Mon Aug 7 11:08:49 UTC 2017

Sorry for spamming... forgot to write - something that I know is missing in this patch is support for 'reopen files' (USR1)
In order to handle that, I probably need to add an rbtree on cycle->open_files (the O(n) search that is currently performed 
in ngx_conf_open_file doesn't seem appropriate for runtime)
If the general concept in the below patch is approved, I will happily submit a patch for this as well.

Thanks

Eran

-----Original Message-----
From: Eran Kornblau 
Sent: Monday, August 7, 2017 1:36 PM
To: nginx-devel at nginx.org
Subject: Add support for buffering is scripted logs

Hi all,

The attached patch adds support for log buffering when using variables in the access log file name.

The use case is this - we use nginx to receive analytics beacons and write them to the access log.
We'd like to have a log file per hour that contains the logs of the specific hour. If we use some external script to perform log rotate, we cannot avoid log lines slipping between adjacent files.
It's important for us to have the log lines partitioning accurate, in case we need to reindex a specific hour to the database.
Scripted logs seem perfect for this, but we don't want to give up on log compression, since the number of events can be high.

The patch relies on the open file cache to keep the context (buffer + flush event) of each file.
In order to do that, I added the ability to register callbacks in open file cache for these events:
1. init - new cached file object created 2. flush - the file handle of a cached file is closed 3. free - the cached file is destroyed The context is allocated right after the ngx_cached_open_file_t struct, this was done in order to avoid increasing the size of the ngx_cached_open_file_t struct (compared to the alternative of adding some opaque pointer on this struct).
Therefore, there are no memory implications for "regular" (log = 0) open file caches, the overhead is only an extra 'if' on the creation / deletion of the cached file object (no impact at all on cache hit, which is probably the most performance sensitive) 

Btw, on a similar subject - can anyone explain the purpose of checking the existence of the root dir in scripted logs? Only explanation I could think of is to provide some security protection in case $uri is used in the script, but that sounds like a very specific use case to me... 
Anyway, if that is indeed the case, maybe it makes sense to add conf directive to enable/disable this behavior?

Thank you!

Eran