lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <willemdebruijn.kernel.38eb3bd85943@gmail.com>
Date: Fri, 12 Sep 2025 18:47:08 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Stanislav Fomichev <stfomichev@...il.com>, 
 Tobias Böhm <tobias.boehm@...zner-cloud.de>
Cc: Alexei Starovoitov <ast@...nel.org>, 
 Daniel Borkmann <daniel@...earbox.net>, 
 Andrii Nakryiko <andrii@...nel.org>, 
 bpf@...r.kernel.org, 
 Marcus Wichelmann <marcus.wichelmann@...zner-cloud.de>, 
 netdev@...r.kernel.org, 
 willemdebruijn.kernel@...il.com, 
 william.xuanziyang@...wei.com
Subject: Re: [BUG?] bpf_skb_net_shrink does not unset encapsulation flag

Stanislav Fomichev wrote:
> On 09/10, Tobias Böhm wrote:
> > Hi,
> > 
> > when decapsulating VXLAN packets with bpf_skb_adjust_room and redirecting to
> > a tap device I observed unexpected segmentation.
> > 
> > In my setup there is a sched_cls program attached at the ingress path of a
> > physical NIC with GRO enabled. Packets are redirected either directly for
> > plain traffic, or decapsulated beforehand in case of VXLAN. Decapsulation is
> > done by bpf_skb_adjust_room with BPF_F_ADJ_ROOM_DECAP_L3_IPV4.
> > 
> > For both kinds of traffic GRO on the physical NIC works as expected
> > resulting in merged packets.
> > 
> > Large non-decapsulated packets are transmitted directly on the tap interface
> > as expected. But surprisingly, decapsulated packets are being segmented
> > again before transmission.
> > 
> > When analyzing and comparing the call chains I observed that
> > netif_skb_features returns different values for the different kind of
> > traffic.
> > 
> > The tap devices have the following features set:
> > 
> >     dev->features        =   0x1558c9
> >     dev->hw_enc_features = 0x10000001
> > 
> > For the non-decapsulated traffic netif_skb_features returns 0x1558c9 but for
> > the decapsulated traffic it returns 0x1. This is same value as the result of
> > "dev->features & dev->hw_enc_features".
> > 
> > In netif_skb_features this operation effectively happens in case
> > skb->encapsulation is set. Inspecting the skb in both cases showed that in
> > case of decapsulation the skb->encapsulation flag was indeed still set.
> > 
> > I wonder if there is a reason that the skb->encapsulation flag is not unset
> > in bpf_skb_net_shrink when BPF_F_ADJ_ROOM_DECAP_* flags are present? Since
> > skb->encapsulation is set in bpf_skb_net_grow when adding space for
> > encapsulation my expectation would be that the flag is also unset when doing
> > the opposite operation.
> 
> + Willem and netdev for visibility.

I think it just has not been implemented before.

The encap path is more strict. Besides setting skb->encapsulation, it
also initializes the inner_.. helpers.

The decap path does not do this, it expects IPIP packets to arrive
from the network, without the stack detecting them as such or
setting skb->encapsulation.

We must preserve that behavior. But we additionally can detect skbs
with encapsulation fields configured, and convert those.

The encap path also explicit UDP_L4 and GRE flags to update GSO
packets. For VXLAN decap, we probably need the same?



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ