netdev - Re: [RFC 0/2] Delayed binding of UDP sockets for Quic per-connection sockets

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-id: <20181101035050.GO80792@MacBook-Pro-19.local>
Date:   Wed, 31 Oct 2018 20:50:50 -0700
From:   Christoph Paasch <cpaasch@...le.com>
To:     Eric Dumazet <eric.dumazet@...il.com>
Cc:     netdev@...r.kernel.org, Ian Swett <ianswett@...gle.com>,
        Leif Hedstrom <lhedstrom@...le.com>,
        Jana Iyengar <jri.ietf@...il.com>
Subject: Re: [RFC 0/2] Delayed binding of UDP sockets for Quic per-connection
 sockets

On 31/10/18 - 17:53:22, Eric Dumazet wrote:
> On 10/31/2018 04:26 PM, Christoph Paasch wrote:
> > Implementations of Quic might want to create a separate socket for each
> > Quic-connection by creating a connected UDP-socket.
> > 
> 
> Nice proposal, but I doubt a QUIC server can afford having one UDP socket per connection ?
> 
> It would add a huge overhead in term of memory usage in the kernel,
> and lots of epoll events to manage (say a QUIC server with one million flows, receiving
> very few packets per second per flow)
> 
> Maybe you could elaborate on the need of having one UDP socket per connection.

I let Leif chime in on that as the ask came from him. Leif & his team are
implementing Quic in the Apache Traffic Server.


One advantage I can see is that it would allow to benefit from fq_pacing as
one could set sk_pacing_rate simply on the socket. That way there is no need
to implement the pacing in the user-space anymore.


> > To achieve that on the server-side, a "master-socket" needs to wait for
> > incoming new connections and then creates a new socket that will be a
> > connected UDP-socket. To create that latter one, the server needs to
> > first bind() and then connect(). However, after the bind() the server
> > might already receive traffic on that new socket that is unrelated to the
> > Quic-connection at hand. Only after the connect() a full 4-tuple match
> > is happening. So, one can't really create this kind of a server that has
> > a connected UDP-socket per Quic connection.
> > 
> > So, what is needed is an "atomic bind & connect" that basically
> > prevents any incoming traffic until the connect() call has been issued
> > at which point the full 4-tuple is known.
> > 
> > 
> > This patchset implements this functionality and exposes a socket-option
> > to do this.
> > 
> > Usage would be:
> > 
> >         int fd = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);
> > 
> >         int val = 1;
> >         setsockopt(fd, SOL_SOCKET, SO_DELAYED_BIND, &val, sizeof(val));
> > 
> >         bind(fd, (struct sockaddr *)&src, sizeof(src));
> > 
> > 	/* At this point, incoming traffic will never match on this socket */
> > 
> >         connect(fd, (struct sockaddr *)&dst, sizeof(dst));
> > 
> > 	/* Only now incoming traffic will reach the socket */
> > 
> > 
> > 
> > There is literally an infinite number of ways on how to implement it,
> > which is why I first send it out as an RFC. With this approach here I
> > chose the least invasive one, just preventing the match on the incoming
> > path.
> > 
> > 
> > The reason for choosing a SOL_SOCKET socket-option and not at the
> > SOL_UDP-level is because that functionality actually could be useful for
> > other protocols as well. E.g., TCP wants to better use the full 4-tuple space
> > by binding to the source-IP and the destination-IP at the same time.
> 
> Passive TCP flows can not benefit from this idea.
> 
> Active TCP flows can already do that, I do not really understand what you are suggesting.

What we had here is that we wanted to let a server initiate more than 64K
connections *while* binding also to a source-IP.
With TCP the bind() would then pick a source-port and we ended up hitting the
64K limit. If we could do an atomic "bind + connect", source-port selection
could ensure that the 4-tuple is unique.

Or has something changed in recent times that allows to use the 4-tuple
matching when doing this with TCP?


Christoph