Add support for buffering is scripted logs

Mon Aug 14 15:45:07 UTC 2017

Hello!

On Mon, Aug 07, 2017 at 10:36:19AM +0000, Eran Kornblau wrote:

> Hi all,
> 
> The attached patch adds support for log buffering when using variables in the access log file name.
> 
> The use case is this - we use nginx to receive analytics beacons and write them to the access log.
> We'd like to have a log file per hour that contains the logs of the specific hour. If we use some external 
> script to perform log rotate, we cannot avoid log lines slipping between adjacent files.
> It's important for us to have the log lines partitioning accurate, in case we need to reindex a specific 
> hour to the database.
> Scripted logs seem perfect for this, but we don't want to give up on log compression, since the 
> number of events can be high.
> 
> The patch relies on the open file cache to keep the context (buffer + flush event) of each file.
> In order to do that, I added the ability to register callbacks in open file cache for these events:
> 1. init - new cached file object created
> 2. flush - the file handle of a cached file is closed
> 3. free - the cached file is destroyed
> The context is allocated right after the ngx_cached_open_file_t struct, this was done in order to avoid
> increasing the size of the ngx_cached_open_file_t struct (compared to the alternative of adding some
> opaque pointer on this struct).
> Therefore, there are no memory implications for "regular" (log = 0) open file caches, the overhead is
> only an extra 'if' on the creation / deletion of the cached file object (no impact at all on cache hit,
> which is probably the most performance sensitive) 

Just a quick note: for me, the whole feature looks questionable, 
and the implementation is far from being in a commitable state.

> Btw, on a similar subject - can anyone explain the purpose of checking the existence of the root dir
> in scripted logs? Only explanation I could think of is to provide some security protection in case $uri
> is used in the script, but that sounds like a very specific use case to me... 
> Anyway, if that is indeed the case, maybe it makes sense to add conf directive to enable/disable 
> this behavior?

The basic purpose of this restriction is to protect configurations 
like

    server {
        access_log /spool/vhost/logs/$host;
        ...
    }

from creating arbitrary log files, thus allowing DoS attacks via 
inode exhaustion.  Accordingly, the documentation 
(http://nginx.org/r/access_log) recommends to specify both root 
and access_log if you want to use variables in logging:

: during each log write the existence of the request’s root 
: directory is checked, and if it does not exist the log is not 
: created. It is thus a good idea to specify both root and 
: access_log on the same level:
:
: server {
:     root       /spool/vhost/data/$host;
:     access_log /spool/vhost/logs/$host;
:     ...

-- 
Maxim Dounin
http://nginx.org/