[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250425093337.GN3042781@horms.kernel.org>
Date: Fri, 25 Apr 2025 10:33:37 +0100
From: Simon Horman <horms@...nel.org>
To: Daniel Borkmann <daniel@...earbox.net>
Cc: kuba@...nel.org, netdev@...r.kernel.org, bpf@...r.kernel.org,
bcm-kernel-feedback-list@...adcom.com,
Andrew Sauber <andrew.sauber@...valent.com>,
Anton Protopopov <aspsk@...valent.com>,
William Tu <witu@...dia.com>, Martin Zaharinov <micron10@...il.com>,
Ronak Doshi <ronak.doshi@...adcom.com>
Subject: Re: [PATCH net] vmxnet3: Fix malformed packet sizing in
vmxnet3_process_xdp
On Wed, Apr 23, 2025 at 03:36:00PM +0200, Daniel Borkmann wrote:
> vmxnet3 driver's XDP handling is buggy for packet sizes using ring0 (that
> is, packet sizes between 128 - 3k bytes).
>
> We noticed MTU-related connectivity issues with Cilium's service load-
> balancing in case of vmxnet3 as NIC underneath. A simple curl to a HTTP
> backend service where the XDP LB was doing IPIP encap led to overly large
> packet sizes but only for *some* of the packets (e.g. HTTP GET request)
> while others (e.g. the prior TCP 3WHS) looked completely fine on the wire.
>
> In fact, the pcap recording on the backend node actually revealed that the
> node with the XDP LB was leaking uninitialized kernel data onto the wire
> for the affected packets, for example, while the packets should have been
> 152 bytes their actual size was 1482 bytes, so the remainder after 152 bytes
> was padded with whatever other data was in that page at the time (e.g. we
> saw user/payload data from prior processed packets).
>
> We only noticed this through an MTU issue, e.g. when the XDP LB node and
> the backend node both had the same MTU (e.g. 1500) then the curl request
> got dropped on the backend node's NIC given the packet was too large even
> though the IPIP-encapped packet normally would never even come close to
> the MTU limit. Lowering the MTU on the XDP LB (e.g. 1480) allowed to let
> the curl request succeed (which also indicates that the kernel ignored the
> padding, and thus the issue wasn't very user-visible).
>
> Commit e127ce7699c1 ("vmxnet3: Fix missing reserved tailroom") was too eager
> to also switch xdp_prepare_buff() from rcd->len to rbi->len. It really needs
> to stick to rcd->len which is the actual packet length from the descriptor.
> The latter we also feed into vmxnet3_process_xdp_small(), by the way, and
> it indicates the correct length needed to initialize the xdp->{data,data_end}
> parts. For e127ce7699c1 ("vmxnet3: Fix missing reserved tailroom") the
> relevant part was adapting xdp_init_buff() to address the warning given the
> xdp_data_hard_end() depends on xdp->frame_sz. With that fixed, traffic on
> the wire looks good again.
>
> Fixes: e127ce7699c1 ("vmxnet3: Fix missing reserved tailroom")
> Signed-off-by: Daniel Borkmann <daniel@...earbox.net>
> Tested-by: Andrew Sauber <andrew.sauber@...valent.com>
> Cc: Anton Protopopov <aspsk@...valent.com>
> Cc: William Tu <witu@...dia.com>
> Cc: Martin Zaharinov <micron10@...il.com>
> Cc: Ronak Doshi <ronak.doshi@...adcom.com>
Reviewed-by: Simon Horman <horms@...nel.org>
Powered by blists - more mailing lists