lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 21 Jun 2024 18:15:01 +0200
From: Daniel Borkmann <daniel@...earbox.net>
To: Yan Zhai <yan@...udflare.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@...il.com>,
 netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
 Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
 Paolo Abeni <pabeni@...hat.com>, Alexei Starovoitov <ast@...nel.org>,
 Jesper Dangaard Brouer <hawk@...nel.org>,
 John Fastabend <john.fastabend@...il.com>,
 Willem de Bruijn <willemb@...gle.com>, Simon Horman <horms@...nel.org>,
 Florian Westphal <fw@...len.de>, Mina Almasry <almasrymina@...gle.com>,
 Abhishek Chauhan <quic_abchauha@...cinc.com>,
 David Howells <dhowells@...hat.com>,
 Alexander Lobakin <aleksander.lobakin@...el.com>,
 David Ahern <dsahern@...nel.org>, Richard Gobert <richardbgobert@...il.com>,
 Antoine Tenart <atenart@...nel.org>, Felix Fietkau <nbd@....name>,
 Soheil Hassas Yeganeh <soheil@...gle.com>,
 Pavel Begunkov <asml.silence@...il.com>,
 Lorenzo Bianconi <lorenzo@...nel.org>,
 Thomas Weißschuh <linux@...ssschuh.net>,
 linux-kernel@...r.kernel.org, bpf@...r.kernel.org
Subject: Re: [RFC net-next 1/9] skb: introduce gro_disabled bit

On 6/21/24 6:00 PM, Yan Zhai wrote:
> On Fri, Jun 21, 2024 at 8:13 AM Daniel Borkmann <daniel@...earbox.net> wrote:
>> On 6/21/24 2:15 PM, Willem de Bruijn wrote:
>>> Yan Zhai wrote:
>>>> Software GRO is currently controlled by a single switch, i.e.
>>>>
>>>>     ethtool -K dev gro on|off
>>>>
>>>> However, this is not always desired. When GRO is enabled, even if the
>>>> kernel cannot GRO certain traffic, it has to run through the GRO receive
>>>> handlers with no benefit.
>>>>
>>>> There are also scenarios that turning off GRO is a requirement. For
>>>> example, our production environment has a scenario that a TC egress hook
>>>> may add multiple encapsulation headers to forwarded skbs for load
>>>> balancing and isolation purpose. The encapsulation is implemented via
>>>> BPF. But the problem arises then: there is no way to properly offload a
>>>> double-encapsulated packet, since skb only has network_header and
>>>> inner_network_header to track one layer of encapsulation, but not two.
>>>> On the other hand, not all the traffic through this device needs double
>>>> encapsulation. But we have to turn off GRO completely for any ingress
>>>> device as a result.
>>>>
>>>> Introduce a bit on skb so that GRO engine can be notified to skip GRO on
>>>> this skb, rather than having to be 0-or-1 for all traffic.
>>>>
>>>> Signed-off-by: Yan Zhai <yan@...udflare.com>
>>>> ---
>>>>    include/linux/netdevice.h |  9 +++++++--
>>>>    include/linux/skbuff.h    | 10 ++++++++++
>>>>    net/Kconfig               | 10 ++++++++++
>>>>    net/core/gro.c            |  2 +-
>>>>    net/core/gro_cells.c      |  2 +-
>>>>    net/core/skbuff.c         |  4 ++++
>>>>    6 files changed, 33 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>> index c83b390191d4..2ca0870b1221 100644
>>>> --- a/include/linux/netdevice.h
>>>> +++ b/include/linux/netdevice.h
>>>> @@ -2415,11 +2415,16 @@ struct net_device {
>>>>       ((dev)->devlink_port = (port));                         \
>>>>    })
>>>>
>>>> -static inline bool netif_elide_gro(const struct net_device *dev)
>>>> +static inline bool netif_elide_gro(const struct sk_buff *skb)
>>>>    {
>>>> -    if (!(dev->features & NETIF_F_GRO) || dev->xdp_prog)
>>>> +    if (!(skb->dev->features & NETIF_F_GRO) || skb->dev->xdp_prog)
>>>>               return true;
>>>> +
>>>> +#ifdef CONFIG_SKB_GRO_CONTROL
>>>> +    return skb->gro_disabled;
>>>> +#else
>>>>       return false;
>>>> +#endif
>>>
>>> Yet more branches in the hot path.
>>>
>>> Compile time configurability does not help, as that will be
>>> enabled by distros.
>>>
>>> For a fairly niche use case. Where functionality of GRO already
>>> works. So just a performance for a very rare case at the cost of a
>>> regression in the common case. A small regression perhaps, but death
>>> by a thousand cuts.
>>
>> Mentioning it here b/c it perhaps fits in this context, longer time ago
>> there was the idea mentioned to have BPF operating as GRO engine which
>> might also help to reduce attack surface by only having to handle packets
>> of interest for the concrete production use case. Perhaps here meta data
>> buffer could be used to pass a notification from XDP to exit early w/o
>> aggregation.
> 
> Metadata is in fact one of our interests as well. We discussed using
> metadata instead of a skb bit to carry this information internally.
> Since metadata is opaque atm so it seems the only option is to have a
> GRO control hook before napi_gro_receive, and let BPF decide
> netif_receive_skb or napi_gro_receive (echo what Paolo said). With BPF
> it could indeed be more flexible, but the cons is that it could be
> even more slower than taking a bit on skb. I am actually open to
> either approach, as long as it gives us more control on when to enable
> GRO :)

Oh wait, one thing that just came to mind.. have you tried u64 per-CPU
counter map in XDP? For packets which should not be GRO-aggregated you
add count++ into the meta data area, and this forces GRO to not aggregate
since meta data that needs to be transported to tc BPF layer mismatches
(and therefore the contract/intent is that tc BPF needs to see the different
meta data passed to it).

Thanks,
Daniel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ