Adding support for transparent reverse proxies

Mon Oct 8 02:00:14 UTC 2018

So, I'm looking to add support to nginx for transparent reverse proxies
via UNIX sockets, and am hoping for some advice on how to do it with
minimal impact on the rest of the server (both performance, and code
complexity). 

For clarity, a transparent reverse proxy takes an incoming (TCP)
connection and copies the file descriptor to an 'upstream' backend, via
a UNIX socket, with the sendmsg system call.  The determination of which
upstream server should receive the file descriptor is made via calling
recv on the connection with the MSG_PEEK flag, and then inspecting
either the HTTP header, or the TLS header to determine the intended
hostname.  The upstream backend catches the file descriptor via recvmsg,
and then proceeds as though it were directly listening on the TCP interface.

Transparent proxies have a number of performance and convenience
advantages over the conventional HTTP proxy protocol.  On the
performance front, once the reverse proxy passes the file descriptor to
the upstream backend, it can immediately close its copy of the file
handle.  This accomplishes several things.  First, there is no need to
tell the upstream host anything about the file handle (like peer address
or port), as upstream can use the usual socket functions to obtain that
information, or set any socket options needed.  Second, there is no need
to maintain a connection between the proxy and upstream, so the number
of open file handles per connection is reduced by 1.  Third, there is no
need to copy or buffer data between the TCP connection and the upstream
server, which means large files can be sent directly via the sendfile
system call.

On the complexity front, beyond adding one indirection step in obtaining
the file descriptor, the upstream backend has no increased complexity
relative to running directly on the TCP port.  There is no need to read
the peer address from the proxy.  Socket options can be set directly on
the file descriptor, rather than relaying requests to set them through
the proxy.  Calling shutdown on the file descriptor will shutdown the
connection to the client machine, no delay, no ability for the proxy to
fail to flush the pending data or close the connection.  Additionally,
the reverse proxy is greatly simplified, taking less than 100 LoC, most
of which are simply to parse the TLS headers.

Anyway, after looking at the development_guide for nginx, and poking
through the source code, I see a couple of possible ways to implement
this.  The simplest way is to only accept a single file descriptor per
incoming connection on the UNIX socket, shutting down the incoming
connection after receiving one file descriptor.  This can be done inside
ngx_event_accept, by retrieving the file descriptor, shutting down the
accepted connection (c->fd), and setting c->fd to the new file
descriptor.  Unfortunately, this requires the reverse proxy to be
reliably fast at passing the incoming file descriptor, but is trivially
simple for testing.  I think a better solution would be a new
ls->handler, which runs when the socket is ready for reading, and then
runs the existing ls->handler (ngx_http_init_connection) once the file
descriptor is fetched and the existing socket is closed. 

The more intrusive, but technically superior way to implement running
behind a transparent reverse proxy, is to reuse the socket connection
from the reverse proxy for an unlimited number of file descriptors. 
This would involve a new handler for ngx_connection_t, which adds the
connection to the list of connections watched by poll, kqueue or
similar.  When the connection is ready for reading, recvmsg is used to
fetch the file descriptor(s), which are then initialized like a normal
http connection.  The UNIX socket connection would then persist either
until the peer disconnects, or nginx shuts down.

I have implemented the first method, as a proof of concept, but I have
several questions before trying to implement the second method. 

First, should this use a new nginx listener, rather than simply a
setting on the existing listener?  I suspect the answer is 'yes', since
I think it needs its own handler, which either runs before or instead of
ngx_http_init_connection.

Second, is there any reason the UNIX socket connections can't get put in
the same pool as the TCP connections? 

Third, should this be implemented in its own file, similarly to how
http/2 is separated out?

Fourth, what should the config file syntax be?  In my proof of concept
version, I just added a flag after the 'listen unix:path', but I could
see an advantage to defining the expected file descriptors separate from
the UNIX socket.