<div dir="ltr">I tested this change with the following script:<div><br></div><div>```</div><div><div>#!/usr/bin/perl</div><div><br></div><div>###############################################################################</div><div><br></div><div>use warnings;</div><div>use strict;</div><div><br></div><div>use Test::More;</div><div><br></div><div>BEGIN { use FindBin; chdir($FindBin::Bin); }</div><div><br></div><div>use lib 'lib';</div><div>use Test::Nginx;</div><div><br></div><div>###############################################################################</div><div><br></div><div>select STDERR; $| = 1;</div><div>select STDOUT; $| = 1;</div><div><br></div><div>my $t = Test::Nginx->new()</div><div> ->has(qw/http limit_req/)</div><div> ->plan(2)</div><div> ->write_file_expand('nginx.conf', <<'EOF');</div><div><br></div><div>%%TEST_GLOBALS%%</div><div><br></div><div>daemon off;</div><div><br></div><div>events {}</div><div><br></div><div>http {</div><div> %%TEST_GLOBALS_HTTP%%</div><div><br></div><div> limit_req_zone $binary_remote_addr zone=one:1m rate=1r/m;</div><div> limit_req zone=one;</div><div><br></div><div> server {</div><div> listen <a href="http://127.0.0.1:8080">127.0.0.1:8080</a>;</div><div> server_name localhost;</div><div><br></div><div> location / {</div><div> error_log stderr;</div><div> }</div><div> }</div><div>}</div><div><br></div><div>EOF</div><div><br></div><div>open OLDERR, '>&', \*STDERR;</div><div>open STDERR, '>', $t->testdir() . '/stderr' or die "Can't reopen STDERR: $!";</div><div>open my $stderr, '<', $t->testdir() . '/stderr' or die "Can't open stderr file: $!";</div><div><br></div><div>$t->run();</div><div><br></div><div>open STDERR, '>&', \*OLDERR;</div><div><br></div><div>###############################################################################</div><div><br></div><div># charge limit_req</div><div>http_get('/');</div><div><br></div><div>http_get('/"foo bar"');</div><div>is(lines($t, 'request: "GET /\"foo bar\" HTTP/1.0"'), 1);</div><div><br></div><div>http_get_with_referrer('/', '<a href="https://www.google.com.au/search?hl=en&q=">https://www.google.com.au/search?hl=en&q=</a>"BauGutachter" +41 2016 .ch&num=100&start=0');</div><div>is(lines($t, 'referrer: "<a href="https://www.google.com.au/search?hl=en&q=\">https://www.google.com.au/search?hl=en&q=\</a>"BauGutachter\" +41 2016 .ch&num=100&start=0"'), 1);</div><div><br></div><div>###############################################################################</div><div><br></div><div>sub http_get_with_referrer {</div><div> my ($url, $referer) = @_;</div><div> return http(<<EOF);</div><div>GET $url HTTP/1.0</div><div>Host: localhost</div><div>Referer: $referer</div><div><br></div><div>EOF</div><div>}</div><div><br></div><div>sub lines {</div><div> my ($t, $pattern) = @_;</div><div> return map { $_ =~ /\Q$pattern\E/ } (<$stderr>);</div><div>}</div><div><br></div><div>###############################################################################</div></div><div>```</div></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature" data-smartmail="gmail_signature"><font size="4"><b>Joshua Spence</b></font><br><br><div><b>mobile:</b> +61 422 990 263</div><b>email:</b> <a href="mailto:josh@joshuaspence.com" target="_blank">josh@joshuaspence.com</a><br><div style="padding:0px;margin-left:0px;margin-top:0px;overflow:hidden;word-wrap:break-word;color:black;font-size:10px;text-align:left;line-height:130%"></div><div><b>web:</b> <a href="http://www.joshuaspence.com" target="_blank">http://www.joshuaspence.com</a></div></div></div>
<br><div class="gmail_quote">On 3 January 2017 at 17:50, Joshua Spence <span dir="ltr"><<a href="mailto:josh@joshuaspence.com" target="_blank">josh@joshuaspence.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""># HG changeset patch<br>
# User Joshua Spence <<a href="mailto:josh@joshuaspence.com">josh@joshuaspence.com</a>><br>
# Date 1483420178 -39600<br>
# Tue Jan 03 16:09:38 2017 +1100<br>
</span># Branch error_log_escape<br>
# Node ID 30a9ce42aadfcf135f3ad95a296c95<wbr>e7044d5f9d<br>
<span class=""># Parent a42afc225e98afe1f7c3b428c81c8a<wbr>f7eba362c0<br>
Escape strings produced by the `%V` format specification<br>
<br>
Currently nginx error logs loosely adhere to the following pattern:<br>
<br>
```<br>
%{TIMESTAMP} [%{SEVERITY}] %{PROCESS_ID}#%{THREAD_ID}:( *%{CONNECTION_ID})?%{MESSAGE}(<wbr>, %{KEY}: "%{VALUE}")*<br>
```<br>
<br>
In our environment we are using Filebeat to ship nginx error logs into Logstash. Our Logstash filter for parsing nginx error logs is as follows:<br>
<br>
```<br>
filter {<br>
grok {<br>
match => {<br>
"message" => "(?<timestamp>%{YEAR}/%{<wbr>MONTHNUM2}/%{MONTHDAY} %{TIME}) \[%{LOGLEVEL:severity}\] %{POSINT:process_id:int}#%{<wbr>NONNEGINT:thread_id:int}:(?: \*%{NONNEGINT:connection_id:<wbr>int})? (?<message>[^,]*)(?:, %{GREEDYDATA:nginx_keyvalue_<wbr>data})?"<br>
}<br>
overwrite => ["message"]<br>
}<br>
<br>
date {<br>
match => ["timestamp", "yyyy/MM/dd HH:mm:ss"]<br>
remove_field => ["timestamp"]<br>
}<br>
<br>
kv {<br>
field_split => ", "<br>
remove_field => "nginx_keyvalue_data"<br>
source => "nginx_keyvalue_data"<br>
value_split => ": "<br>
}<br>
}<br>
```<br>
<br>
The problem, however, is that it is difficult to unambigiously parse the key-value data from the nginx error log format because of a lack of data sanitization. Specifically, consider the following entry from our nginx error logs:<br>
<br>
```<br>
2016/11/15 03:45:51 [error] 23770#23770: *21535000 open() "/REDACTED/+"BauGutachter"++<wbr>41+2016+.ch&tbo=1" failed (2: No such file or directory), client: REDACTED, server: REDACTED, request: "GET /job/+%22BauGutachter%22++41+<wbr>2016+.ch&tbo=1 HTTP/1.1", host: "REDACTED", referrer: "<a href="https://www.google.com.au/search?hl=en&q=" rel="noreferrer" target="_blank">https://www.google.com.au/<wbr>search?hl=en&q=</a>"BauGutachter" +41 2016 .ch&num=100&start=0"<br>
```<br>
<br>
Specifically, the problem in the above sample is the `referrer` field, the value of which contains double quotes. I think that it makes sense to escape `\` and `"` characters before they are written to the error log.<br>
<br>
</span>diff -r a42afc225e98 -r 30a9ce42aadf src/core/ngx_string.c<br>
<div class="HOEnZb"><div class="h5">--- a/src/core/ngx_string.c Tue Dec 27 17:23:08 2016 +0300<br>
+++ b/src/core/ngx_string.c Tue Jan 03 16:09:38 2017 +1100<br>
@@ -15,7 +15,7 @@<br>
const u_char *basis, ngx_uint_t padding);<br>
static ngx_int_t ngx_decode_base64_internal(<wbr>ngx_str_t *dst, ngx_str_t *src,<br>
const u_char *basis);<br>
-<br>
+static uintptr_t ngx_escape_string(u_char *dst, u_char *src, size_t size);<br>
<br>
void<br>
ngx_strlow(u_char *dst, u_char *src, size_t n)<br>
@@ -235,7 +235,7 @@<br>
v = va_arg(args, ngx_str_t *);<br>
<br>
len = ngx_min(((size_t) (last - buf)), v->len);<br>
- buf = ngx_cpymem(buf, v->data, len);<br>
+ buf = (u_char *) ngx_escape_string(buf, v->data, len);<br>
fmt++;<br>
<br>
continue;<br>
@@ -1844,6 +1844,43 @@<br>
}<br>
<br>
<br>
+static uintptr_t<br>
+ngx_escape_string(u_char *dst, u_char *src, size_t size)<br>
+{<br>
+ u_char ch;<br>
+ ngx_uint_t len;<br>
+<br>
+ if (dst == NULL) {<br>
+ len = 0;<br>
+<br>
+ while (size) {<br>
+ ch = *src++;<br>
+<br>
+ if (ch == '\\' || ch == '"') {<br>
+ len++;<br>
+ }<br>
+<br>
+ size--;<br>
+ }<br>
+<br>
+ return (uintptr_t) len;<br>
+ }<br>
+<br>
+ while (size) {<br>
+ ch = *src++;<br>
+<br>
+ if (ch == '\\' || ch == '"') {<br>
+ *dst++ = '\\';<br>
+ }<br>
+<br>
+ *dst++ = ch;<br>
+ size--;<br>
+ }<br>
+<br>
+ return (uintptr_t) dst;<br>
+}<br>
+<br>
+<br>
void<br>
ngx_str_rbtree_insert_value(<wbr>ngx_rbtree_node_t *temp,<br>
ngx_rbtree_node_t *node, ngx_rbtree_node_t *sentinel)<br>
</div></div></blockquote></div><br></div>