linux-kernel - Re: [RFC net-next 1/9] skb: introduce gro

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAO3-Pbo_gNVP4qcEGNJe-RmPBy7CgFZab+dwwv2MyFiJRg9_fA@mail.gmail.com>
Date: Fri, 21 Jun 2024 10:17:02 -0500
From: Yan Zhai <yan@...udflare.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>, 
	Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, 
	Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, 
	Jesper Dangaard Brouer <hawk@...nel.org>, John Fastabend <john.fastabend@...il.com>, 
	Willem de Bruijn <willemb@...gle.com>, Simon Horman <horms@...nel.org>, Florian Westphal <fw@...len.de>, 
	Mina Almasry <almasrymina@...gle.com>, Abhishek Chauhan <quic_abchauha@...cinc.com>, 
	David Howells <dhowells@...hat.com>, Alexander Lobakin <aleksander.lobakin@...el.com>, 
	David Ahern <dsahern@...nel.org>, Richard Gobert <richardbgobert@...il.com>, 
	Antoine Tenart <atenart@...nel.org>, Felix Fietkau <nbd@....name>, 
	Soheil Hassas Yeganeh <soheil@...gle.com>, Pavel Begunkov <asml.silence@...il.com>, 
	Lorenzo Bianconi <lorenzo@...nel.org>, Thomas Weißschuh <linux@...ssschuh.net>, 
	linux-kernel@...r.kernel.org, bpf@...r.kernel.org
Subject: Re: [RFC net-next 1/9] skb: introduce gro_disabled bit

On Fri, Jun 21, 2024 at 4:57 AM Paolo Abeni <pabeni@...hat.com> wrote:
>
> On Thu, 2024-06-20 at 15:19 -0700, Yan Zhai wrote:
> > Software GRO is currently controlled by a single switch, i.e.
> >
> >   ethtool -K dev gro on|off
> >
> > However, this is not always desired. When GRO is enabled, even if the
> > kernel cannot GRO certain traffic, it has to run through the GRO receive
> > handlers with no benefit.
> >
> > There are also scenarios that turning off GRO is a requirement. For
> > example, our production environment has a scenario that a TC egress hook
> > may add multiple encapsulation headers to forwarded skbs for load
> > balancing and isolation purpose. The encapsulation is implemented via
> > BPF. But the problem arises then: there is no way to properly offload a
> > double-encapsulated packet, since skb only has network_header and
> > inner_network_header to track one layer of encapsulation, but not two.
> > On the other hand, not all the traffic through this device needs double
> > encapsulation. But we have to turn off GRO completely for any ingress
> > device as a result.
>
> Could you please add more details WRT this last statement? I'm unsure
> if I understand your problem. My guess is as follow:
>
> Your device receive some traffic, GRO and forward it, and the multiple
> encapsulation can happen on such forwarded traffic (since I can't find
> almost none of the above your message is mainly a wild guess).
>
> Assuming I guessed correctly, I think you could solve the problem with
> no kernel changes: redirect the to-be-tunneled traffic to some virtual
> device and all TX offload on top of it and let the encap happen there.
>
Let's say we have a netns to implement network functions like
DoS/IDS/Load balancing for IP traffic. The netns has a single veth
entrance/exit, and a bunch of ip tunnels, GRE/XFRM, to receive and
tunnel traffic from customer's private sites. Some of such traffic
could be encapsulated to reach services outside of the netns (but on
the same server), for example, customers may also want to use our
CDN/Caching functionality. The complication here is that we might have
to further tunnel traffic to another data center, because the routing
is asymmetric so we can receive client traffic from US but the
response may come back to our EU data center, and in order to do
layer4/layer7 service, we have to make sure those land on the same
server.

It is true that a device like a veth pair or even netkit could allow
the kernel segment GRO packets for us. But this does not sound
actually right in terms of design: if we know already some packet path
should not be GRO-ed, can we enforce this rather than having to
aggregate it then chop it down soon after? For our specific case
though, it also becomes a headache for analytics and customer rules
that rely on ingress device name, we probably need to pair each tunnel
with such a virtual device. There could be hundreds of ipsec tunnels,
and that seems to be a substantial overhead for both data path and
control plane management.

To make this a bit more general, what I'd like to introduce here is:
when we know GRO is either problematic or simply not useful (like to
some UDP traffic), can we have more control toggle to skip it?

thanks
Yan

> Cheers,
>
> Paolo
>