[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <willemdebruijn.kernel.38eb3bd85943@gmail.com>
Date: Fri, 12 Sep 2025 18:47:08 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Stanislav Fomichev <stfomichev@...il.com>,
Tobias Böhm <tobias.boehm@...zner-cloud.de>
Cc: Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>,
bpf@...r.kernel.org,
Marcus Wichelmann <marcus.wichelmann@...zner-cloud.de>,
netdev@...r.kernel.org,
willemdebruijn.kernel@...il.com,
william.xuanziyang@...wei.com
Subject: Re: [BUG?] bpf_skb_net_shrink does not unset encapsulation flag
Stanislav Fomichev wrote:
> On 09/10, Tobias Böhm wrote:
> > Hi,
> >
> > when decapsulating VXLAN packets with bpf_skb_adjust_room and redirecting to
> > a tap device I observed unexpected segmentation.
> >
> > In my setup there is a sched_cls program attached at the ingress path of a
> > physical NIC with GRO enabled. Packets are redirected either directly for
> > plain traffic, or decapsulated beforehand in case of VXLAN. Decapsulation is
> > done by bpf_skb_adjust_room with BPF_F_ADJ_ROOM_DECAP_L3_IPV4.
> >
> > For both kinds of traffic GRO on the physical NIC works as expected
> > resulting in merged packets.
> >
> > Large non-decapsulated packets are transmitted directly on the tap interface
> > as expected. But surprisingly, decapsulated packets are being segmented
> > again before transmission.
> >
> > When analyzing and comparing the call chains I observed that
> > netif_skb_features returns different values for the different kind of
> > traffic.
> >
> > The tap devices have the following features set:
> >
> > dev->features = 0x1558c9
> > dev->hw_enc_features = 0x10000001
> >
> > For the non-decapsulated traffic netif_skb_features returns 0x1558c9 but for
> > the decapsulated traffic it returns 0x1. This is same value as the result of
> > "dev->features & dev->hw_enc_features".
> >
> > In netif_skb_features this operation effectively happens in case
> > skb->encapsulation is set. Inspecting the skb in both cases showed that in
> > case of decapsulation the skb->encapsulation flag was indeed still set.
> >
> > I wonder if there is a reason that the skb->encapsulation flag is not unset
> > in bpf_skb_net_shrink when BPF_F_ADJ_ROOM_DECAP_* flags are present? Since
> > skb->encapsulation is set in bpf_skb_net_grow when adding space for
> > encapsulation my expectation would be that the flag is also unset when doing
> > the opposite operation.
>
> + Willem and netdev for visibility.
I think it just has not been implemented before.
The encap path is more strict. Besides setting skb->encapsulation, it
also initializes the inner_.. helpers.
The decap path does not do this, it expects IPIP packets to arrive
from the network, without the stack detecting them as such or
setting skb->encapsulation.
We must preserve that behavior. But we additionally can detect skbs
with encapsulation fields configured, and convert those.
The encap path also explicit UDP_L4 and GRE flags to update GSO
packets. For VXLAN decap, we probably need the same?
Powered by blists - more mailing lists