lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAO3-Pbo_gNVP4qcEGNJe-RmPBy7CgFZab+dwwv2MyFiJRg9_fA@mail.gmail.com>
Date: Fri, 21 Jun 2024 10:17:02 -0500
From: Yan Zhai <yan@...udflare.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>, 
	Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, 
	Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, 
	Jesper Dangaard Brouer <hawk@...nel.org>, John Fastabend <john.fastabend@...il.com>, 
	Willem de Bruijn <willemb@...gle.com>, Simon Horman <horms@...nel.org>, Florian Westphal <fw@...len.de>, 
	Mina Almasry <almasrymina@...gle.com>, Abhishek Chauhan <quic_abchauha@...cinc.com>, 
	David Howells <dhowells@...hat.com>, Alexander Lobakin <aleksander.lobakin@...el.com>, 
	David Ahern <dsahern@...nel.org>, Richard Gobert <richardbgobert@...il.com>, 
	Antoine Tenart <atenart@...nel.org>, Felix Fietkau <nbd@....name>, 
	Soheil Hassas Yeganeh <soheil@...gle.com>, Pavel Begunkov <asml.silence@...il.com>, 
	Lorenzo Bianconi <lorenzo@...nel.org>, Thomas Weißschuh <linux@...ssschuh.net>, 
	linux-kernel@...r.kernel.org, bpf@...r.kernel.org
Subject: Re: [RFC net-next 1/9] skb: introduce gro_disabled bit

On Fri, Jun 21, 2024 at 4:57 AM Paolo Abeni <pabeni@...hat.com> wrote:
>
> On Thu, 2024-06-20 at 15:19 -0700, Yan Zhai wrote:
> > Software GRO is currently controlled by a single switch, i.e.
> >
> >   ethtool -K dev gro on|off
> >
> > However, this is not always desired. When GRO is enabled, even if the
> > kernel cannot GRO certain traffic, it has to run through the GRO receive
> > handlers with no benefit.
> >
> > There are also scenarios that turning off GRO is a requirement. For
> > example, our production environment has a scenario that a TC egress hook
> > may add multiple encapsulation headers to forwarded skbs for load
> > balancing and isolation purpose. The encapsulation is implemented via
> > BPF. But the problem arises then: there is no way to properly offload a
> > double-encapsulated packet, since skb only has network_header and
> > inner_network_header to track one layer of encapsulation, but not two.
> > On the other hand, not all the traffic through this device needs double
> > encapsulation. But we have to turn off GRO completely for any ingress
> > device as a result.
>
> Could you please add more details WRT this last statement? I'm unsure
> if I understand your problem. My guess is as follow:
>
> Your device receive some traffic, GRO and forward it, and the multiple
> encapsulation can happen on such forwarded traffic (since I can't find
> almost none of the above your message is mainly a wild guess).
>
> Assuming I guessed correctly, I think you could solve the problem with
> no kernel changes: redirect the to-be-tunneled traffic to some virtual
> device and all TX offload on top of it and let the encap happen there.
>
Let's say we have a netns to implement network functions like
DoS/IDS/Load balancing for IP traffic. The netns has a single veth
entrance/exit, and a bunch of ip tunnels, GRE/XFRM, to receive and
tunnel traffic from customer's private sites. Some of such traffic
could be encapsulated to reach services outside of the netns (but on
the same server), for example, customers may also want to use our
CDN/Caching functionality. The complication here is that we might have
to further tunnel traffic to another data center, because the routing
is asymmetric so we can receive client traffic from US but the
response may come back to our EU data center, and in order to do
layer4/layer7 service, we have to make sure those land on the same
server.

It is true that a device like a veth pair or even netkit could allow
the kernel segment GRO packets for us. But this does not sound
actually right in terms of design: if we know already some packet path
should not be GRO-ed, can we enforce this rather than having to
aggregate it then chop it down soon after? For our specific case
though, it also becomes a headache for analytics and customer rules
that rely on ingress device name, we probably need to pair each tunnel
with such a virtual device. There could be hundreds of ipsec tunnels,
and that seems to be a substantial overhead for both data path and
control plane management.

To make this a bit more general, what I'd like to introduce here is:
when we know GRO is either problematic or simply not useful (like to
some UDP traffic), can we have more control toggle to skip it?

thanks
Yan

> Cheers,
>
> Paolo
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ