[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <20181101035050.GO80792@MacBook-Pro-19.local>
Date: Wed, 31 Oct 2018 20:50:50 -0700
From: Christoph Paasch <cpaasch@...le.com>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: netdev@...r.kernel.org, Ian Swett <ianswett@...gle.com>,
Leif Hedstrom <lhedstrom@...le.com>,
Jana Iyengar <jri.ietf@...il.com>
Subject: Re: [RFC 0/2] Delayed binding of UDP sockets for Quic per-connection
sockets
On 31/10/18 - 17:53:22, Eric Dumazet wrote:
> On 10/31/2018 04:26 PM, Christoph Paasch wrote:
> > Implementations of Quic might want to create a separate socket for each
> > Quic-connection by creating a connected UDP-socket.
> >
>
> Nice proposal, but I doubt a QUIC server can afford having one UDP socket per connection ?
>
> It would add a huge overhead in term of memory usage in the kernel,
> and lots of epoll events to manage (say a QUIC server with one million flows, receiving
> very few packets per second per flow)
>
> Maybe you could elaborate on the need of having one UDP socket per connection.
I let Leif chime in on that as the ask came from him. Leif & his team are
implementing Quic in the Apache Traffic Server.
One advantage I can see is that it would allow to benefit from fq_pacing as
one could set sk_pacing_rate simply on the socket. That way there is no need
to implement the pacing in the user-space anymore.
> > To achieve that on the server-side, a "master-socket" needs to wait for
> > incoming new connections and then creates a new socket that will be a
> > connected UDP-socket. To create that latter one, the server needs to
> > first bind() and then connect(). However, after the bind() the server
> > might already receive traffic on that new socket that is unrelated to the
> > Quic-connection at hand. Only after the connect() a full 4-tuple match
> > is happening. So, one can't really create this kind of a server that has
> > a connected UDP-socket per Quic connection.
> >
> > So, what is needed is an "atomic bind & connect" that basically
> > prevents any incoming traffic until the connect() call has been issued
> > at which point the full 4-tuple is known.
> >
> >
> > This patchset implements this functionality and exposes a socket-option
> > to do this.
> >
> > Usage would be:
> >
> > int fd = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);
> >
> > int val = 1;
> > setsockopt(fd, SOL_SOCKET, SO_DELAYED_BIND, &val, sizeof(val));
> >
> > bind(fd, (struct sockaddr *)&src, sizeof(src));
> >
> > /* At this point, incoming traffic will never match on this socket */
> >
> > connect(fd, (struct sockaddr *)&dst, sizeof(dst));
> >
> > /* Only now incoming traffic will reach the socket */
> >
> >
> >
> > There is literally an infinite number of ways on how to implement it,
> > which is why I first send it out as an RFC. With this approach here I
> > chose the least invasive one, just preventing the match on the incoming
> > path.
> >
> >
> > The reason for choosing a SOL_SOCKET socket-option and not at the
> > SOL_UDP-level is because that functionality actually could be useful for
> > other protocols as well. E.g., TCP wants to better use the full 4-tuple space
> > by binding to the source-IP and the destination-IP at the same time.
>
> Passive TCP flows can not benefit from this idea.
>
> Active TCP flows can already do that, I do not really understand what you are suggesting.
What we had here is that we wanted to let a server initiate more than 64K
connections *while* binding also to a source-IP.
With TCP the bind() would then pick a source-port and we ended up hitting the
64K limit. If we could do an atomic "bind + connect", source-port selection
could ensure that the 4-tuple is unique.
Or has something changed in recent times that allows to use the 4-tuple
matching when doing this with TCP?
Christoph
Powered by blists - more mailing lists