[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF=yD-JKapPXLA3nekk8V5VGXkp7hZd-iX6P1RdiGbVc-bOzGw@mail.gmail.com>
Date:   Fri, 31 Aug 2018 11:11:11 -0400
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     Paolo Abeni <pabeni@...hat.com>
Cc:     Sowmini Varadhan <sowmini.varadhan@...cle.com>,
        Network Development <netdev@...r.kernel.org>,
        Willem de Bruijn <willemb@...gle.com>
Subject: Re: [PATCH RFC net-next 00/11] udp gso
On Fri, Aug 31, 2018 at 9:44 AM Paolo Abeni <pabeni@...hat.com> wrote:
>
> On Fri, 2018-08-31 at 09:08 -0400, Willem de Bruijn wrote:
> > On Fri, Aug 31, 2018 at 5:09 AM Paolo Abeni <pabeni@...hat.com> wrote:
> > >
> > > Hi,
> > >
> > > On Tue, 2018-04-17 at 17:07 -0400, Willem de Bruijn wrote:
> > > > That said, for negotiated flows an inverse GRO feature could
> > > > conceivably be implemented to reduce rx stack traversal, too.
> > > > Though due to interleaving of packets on the wire, it aggregation
> > > > would be best effort, similar to TCP TSO and GRO using the
> > > > PSH bit as packetization signal.
> > >
> > > Reviving this old thread, before I forgot again. I have some local
> > > patches implementing UDP GRO in a dual way to current GSO_UDP_L4
> > > implementation: several datagram with the same length are aggregated
> > > into a single one, and the user space receive a single larger packet
> > > instead of multiple ones. I hope quic can leverage such scenario, but I
> > > really know nothing about the protocol.
> > >
> > > I measure roughly a 50% performance improvement with udpgso_bench in
> > > respect to UDP GSO, and ~100% using a pktgen sender, and a reduced CPU
> > > usage on the receiver[1]. Some additional hacking to the general GRO
> > > bits is required to avoid useless socket lookups for ingress UDP
> > > packets when UDP_GSO is not enabled.
> > >
> > > If there is interest on this topic, I can share some RFC patches
> > > (hopefully somewhat next week).
> >
> > As Eric pointed out, QUIC reception on mobile clients over the WAN
> > may not see much gain. But apparently there is a non-trivial amount
> > of traffic the other way, to servers. Again, WAN might limit whatever
> > gain we get, but I do want to look into that. And there are other UDP high
> > throughput workloads (with or without QUIC) between servers.
> >
> > If you have patches, please do share them.
>
> I'll try to clean them up and send them next week (as RFC).
>
> > I actually also have a rough
> > patch that I did not consider ready to share yet. Based on Tom's existing
> > socket lookup in udp_gro_receive to detect whether a local destination
> > exists and whether it has set an option to support receiving coalesced
> > payloads (along with a cmsg to share the segment size).
>
> That is more or less what I'm doing here.
> Side note: I had test it in baremetal, as veth/lo do not trigger the
> GRO path: selftest of this feature is not so straightforward.
>
> > Converting udp_recvmsg to split apart gso packets to make this
> > transparent seems to me to be too complex and not worth the effort.
>
> Agreed. Moreover doing many, small, recvmsg() instead of a single,
> large, one will hit the performances very badly due to PTI and
> HARDENED_USERCOPY.
>
> > If a local socket is not found in udp_gro_receive, this could also be
> > tentative interpreted as a non-local path (with false positives), enabling
> > transparent use of GRO + GSO batching on the forwarding path.
>
> That sounds interesting, even if false positive looks dangerous to me.
> Just to be on the same page, which false positive examples are you
> thinking at? UDP sockets bound to local address behind NAT?
Any packets that would otherwise get dropped, such as packets with
local destination, but no local bound socket. This may increase the
amount of cycles spent on such packets, potentially increasing a DoS
vector.
Powered by blists - more mailing lists
 
