[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJPywTLPzBz50LW7awNMAEOUdjLt4spz3vQ6i3BRKOp2qzBq4g@mail.gmail.com>
Date: Tue, 3 Dec 2019 15:59:15 +0100
From: Marek Majkowski <marek@...udflare.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: Jakub Sitnicki <jakub@...udflare.com>,
Network Development <netdev@...r.kernel.org>,
kernel-team <kernel-team@...udflare.com>
Subject: Re: Delayed source port allocation for connected UDP sockets
On Mon, Dec 2, 2019 at 5:03 PM Willem de Bruijn
<willemdebruijn.kernel@...il.com> wrote:
> So bind might succeed, but connect fail later if the port is already
> bound by another socket inbetween?
Yes, I'm proposing to delay the bind() up till connect(). The
semantics should remain the same, just the actual bind work will be
done atomically in the context of connect.
As mentioned - this is basically what connectx syscall does on some BSD's.
> Related, I have toyed with unhashed sockets with inet_sport set in the
> past for a different use-case: transmit-only sockets. If all receive
> processing happens on a small set (say, per cpu) of unconnected
> listening sockets. Then have unhashed transmit-only connected sockets
> to transmit without route lookup. But the route caching did not
> warrant the cost of maintaining a socket per connection at scale.
This is interesting. We have another use case for that - with TPROXY, we need
to _source_ packets from arbitrary port number. Port number on udp socket
can't be set with usual IP_PKTINFO. Therefore, to source packets from
arbitrary port number we are planning either:
- use raw sockets
- open a port on useless ip but specific sport, like 127.0.0.99:1234,
and call sendto() on it with arbitrary target.
Having proper unhashed sockets would make it slightly less hacky.
[...]
> If CAP_NET_RAW is no issue, Maciej's suggestion of temporarily binding
> to a dummy device (or even lo) might be the simplest approach?
Oh boy. I thought I know enough UDP hacks in Linux, but this brings it
to the next level. Indeed, it works:
sd = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sd.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b"dummy0")
sd.bind(('0.0.0.0', 1234))
sd.connect(("1.1.1.1", 53))
sd.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b"")
With the caveat, that dummy0 must be up. But this successfully
eliminates the race.
Thanks for suggestions,
Marek
Powered by blists - more mailing lists