[patch] Set SO_REUSEADDR on outgoing TCP connections
mdounin at mdounin.ru
Fri Apr 11 12:26:58 UTC 2014
On Thu, Apr 10, 2014 at 11:20:29PM +0100, Marek Majkowski wrote:
> On Thu, Apr 10, 2014 at 6:50 PM, Maxim Dounin <mdounin at mdounin.ru> wrote:
> > On Thu, Apr 10, 2014 at 05:04:30PM +0100, Marek Majkowski wrote:
> >> On Thu, Apr 10, 2014 at 4:40 PM, Maxim Dounin <mdounin at mdounin.ru> wrote:
> >> >> ...
> >> >> The patch will work perfectly well assuming there aren't too many
> >> >> connections to one destination address and port. If that happens the
> >> >> kernel may randomly allocate an outgoing port number that is already
> >> >> used for a given destination and attempt to connect() will fail with
> >> >> EADDRNOTAVAIL. This is fairly easy to detect, and we can just retry
> >> >> connecting again, using another random source port allocated by the
> >> >> kernel.
> >> >
> >> > While it may be interesting approach to overcome the limitation, I
> >> > don't think that this is something that should be done by clients
> >> > in real life. I think it's something kernel should care about,
> >> > not clients.
> >> >
> >> > From practical point of view, trivial solutions are to avoid
> >> > bind() or use multiple addresses for bind().
> >> May I ask how can you specify multiple outgoing IP addresses for a single proxy?
> > There is more than one way to. Most trivial solution would be to
> > use proxy_bind with a variable, and rotate the address in the variable
> > somehow (split_clients, map, ...).
> >> Indeed, this patch does add complexity to the ngx_event_connect_peer
> >> function. Unfortunately nginx currently supports bind before connect
> >> and in current form it severely reduces the number of possible
> >> outgoing connections. Using proxy_bind in current form is harmful.
> >> This patch fixes that.
> > As suggest above, if it doesn't work for you - don't use it.
> > (Actually, the bad thing is that on many OSes it's not possible to
> > create more than 64k outgoing connections even without doing a
> > bind(). And proxy_bind comes to rescue, despite the fact that it
> > was originally implemented for a completely unrelated purpose.)
> >> In perfect world kernel would be able to reuse ports established with
> >> bind before connect. Actually kernel can do that already - this is
> >> what the SO_REUSADDR flag is for. Unfortunately, due to BSD API
> >> compatibility kernel needs to allocate outgoing port on bind() -
> >> before it is told of the destination address. This inevitably may lead
> >> to conflicts indicated by EADDRNOTAVAIL error on connect(), and
> >> results in complexity in proper implementation of bind before connect.
> >> A reasonable solution to bind before connect limitations is to either
> >> merge this patch or retire proxy_bind option.
> > It's more or less obvious that kernel can't do proper source
> > port selection in the "bind(), then connect()" scenario without
> > some API extension (e.g., to allow to postpone source port allocation
> > till destination address is known).
> > Overcoming this limitation with SO_REUSEADDR and multiple retries
> > looks wrong though. It's not a solution, it's at most workaround.
> > And there are other workarounds available, see above.
> How about yet another approach: maybe it's the nginx that should do
> the port allocation logic. We could transform the bind before connect
> thing to suggest a specific port to the kernel. This would reduce the
> complexity of code run in kernel as it wouldn't need to do allocation
> for us, just accept (or in rare cases reject) the bind attempt. Using
> SO_REUSEADDR flag will still be necessary to be able to reuse ports.
> The retry logic would need to be there, but the probability of retry
> will be extremaly low - as we are going to control the port allocation
> for nginx outgoing connections. We'd need to retry only if there is a
> conflict with a connection established by another process, which
> shouldn't happen too often unless the outgoing IP address is used by
> another application.
This is just yet another workaround, and complex one. As
suggested above, simpliest workaround is to don't use bind() at
all. As long as you are using Linux, try "ip route add ... src
<address>" instead; I've never tried, but it should be identical to
"route ... -ifa <address>" on FreeBSD and shouldn't impose 64k
source ports limit.
Alternatively, if you want to solve the problem properly, it
should be done in a kernel - likely by introducing something like
SO_BINDLATER to allow bind() to postpone source port allocation
More information about the nginx-devel