linux-kernel - Re: [BUG] veth: TX drops with NAPI enabled and crash in combination with qdisc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87zfemtbah.fsf@toke.dk>
Date: Thu, 05 Jun 2025 18:15:18 +0200
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Marcus Wichelmann <marcus.wichelmann@...zner-cloud.de>, Jesper Dangaard
 Brouer <hawk@...nel.org>, bpf@...r.kernel.org, netdev@...r.kernel.org
Cc: Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann
 <daniel@...earbox.net>, John Fastabend <john.fastabend@...il.com>, Andrew
 Lunn <andrew+netdev@...n.ch>, "David S. Miller" <davem@...emloft.net>,
 Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
 Paolo Abeni <pabeni@...hat.com>, Jamal Hadi Salim <jhs@...atatu.com>, Cong
 Wang <xiyou.wangcong@...il.com>, Jiri Pirko <jiri@...nulli.us>,
 linux-kernel@...r.kernel.org
Subject: Re: [BUG] veth: TX drops with NAPI enabled and crash in combination
 with qdisc

Marcus Wichelmann <marcus.wichelmann@...zner-cloud.de> writes:

> Hi,
>
> while experimenting with XDP_REDIRECT from a veth-pair to another interface, I
> noticed that the veth-pair looses lots of packets when multiple TCP streams go
> through it, resulting in stalling TCP connections and noticeable instabilities.
>
> This doesn't seem to be an issue with just XDP but rather occurs whenever the
> NAPI mode of the veth driver is active.
> I managed to reproduce the same behavior just by bringing the veth-pair into
> NAPI mode (see commit d3256efd8e8b ("veth: allow enabling NAPI even without
> XDP")) and running multiple TCP streams through it using a network namespace.
>
> Here is how I reproduced it:
>
>   ip netns add lb
>   ip link add dev to-lb type veth peer name in-lb netns lb
>
>   # Enable NAPI
>   ethtool -K to-lb gro on
>   ethtool -K to-lb tso off
>   ip netns exec lb ethtool -K in-lb gro on
>   ip netns exec lb ethtool -K in-lb tso off
>
>   ip link set dev to-lb up
>   ip -netns lb link set dev in-lb up
>
> Then run a HTTP server inside the "lb" namespace that serves a large file:
>
>   fallocate -l 10G testfiles/10GB.bin
>   caddy file-server --root testfiles/
>
> Download this file from within the root namespace multiple times in parallel:
>
>   curl http://[fe80::...%to-lb]/10GB.bin -o /dev/null
>
> In my tests, I ran four parallel curls at the same time and after just a few
> seconds, three of them stalled while the other one "won" over the full bandwidth
> and completed the download.
>
> This is probably a result of the veth's ptr_ring running full, causing many
> packet drops on TX, and the TCP congestion control reacting to that.
>
> In this context, I also took notice of Jesper's patch which describes a very
> similar issue and should help to resolve this:
>   commit dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to
>   reduce TX drops")
>
> But when repeating the above test with latest mainline, which includes this
> patch, and enabling qdisc via
>   tc qdisc add dev in-lb root sfq perturb 10
> the Kernel crashed just after starting the second TCP stream (see output below).
>
> So I have two questions:
> - Is my understanding of the described issue correct and is Jesper's patch
>   sufficient to solve this?

Hmm, yeah, this does sound likely.

> - Is my qdisc configuration to make use of this patch correct and the kernel
>   crash is likely a bug?
>
> ------------[ cut here ]------------
> UBSAN: array-index-out-of-bounds in net/sched/sch_sfq.c:203:12
> index 65535 is out of range for type 'sfq_head [128]'

This (the 'index 65535') kinda screams "integer underflow". So certainly
looks like a kernel bug, yeah. Don't see any obvious reason why Jesper's
patch would trigger this; maybe Eric has an idea?

Does this happen with other qdiscs as well, or is it specific to sfq?

-Toke