[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87zfemtbah.fsf@toke.dk>
Date: Thu, 05 Jun 2025 18:15:18 +0200
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Marcus Wichelmann <marcus.wichelmann@...zner-cloud.de>, Jesper Dangaard
Brouer <hawk@...nel.org>, bpf@...r.kernel.org, netdev@...r.kernel.org
Cc: Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann
<daniel@...earbox.net>, John Fastabend <john.fastabend@...il.com>, Andrew
Lunn <andrew+netdev@...n.ch>, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Jamal Hadi Salim <jhs@...atatu.com>, Cong
Wang <xiyou.wangcong@...il.com>, Jiri Pirko <jiri@...nulli.us>,
linux-kernel@...r.kernel.org
Subject: Re: [BUG] veth: TX drops with NAPI enabled and crash in combination
with qdisc
Marcus Wichelmann <marcus.wichelmann@...zner-cloud.de> writes:
> Hi,
>
> while experimenting with XDP_REDIRECT from a veth-pair to another interface, I
> noticed that the veth-pair looses lots of packets when multiple TCP streams go
> through it, resulting in stalling TCP connections and noticeable instabilities.
>
> This doesn't seem to be an issue with just XDP but rather occurs whenever the
> NAPI mode of the veth driver is active.
> I managed to reproduce the same behavior just by bringing the veth-pair into
> NAPI mode (see commit d3256efd8e8b ("veth: allow enabling NAPI even without
> XDP")) and running multiple TCP streams through it using a network namespace.
>
> Here is how I reproduced it:
>
> ip netns add lb
> ip link add dev to-lb type veth peer name in-lb netns lb
>
> # Enable NAPI
> ethtool -K to-lb gro on
> ethtool -K to-lb tso off
> ip netns exec lb ethtool -K in-lb gro on
> ip netns exec lb ethtool -K in-lb tso off
>
> ip link set dev to-lb up
> ip -netns lb link set dev in-lb up
>
> Then run a HTTP server inside the "lb" namespace that serves a large file:
>
> fallocate -l 10G testfiles/10GB.bin
> caddy file-server --root testfiles/
>
> Download this file from within the root namespace multiple times in parallel:
>
> curl http://[fe80::...%to-lb]/10GB.bin -o /dev/null
>
> In my tests, I ran four parallel curls at the same time and after just a few
> seconds, three of them stalled while the other one "won" over the full bandwidth
> and completed the download.
>
> This is probably a result of the veth's ptr_ring running full, causing many
> packet drops on TX, and the TCP congestion control reacting to that.
>
> In this context, I also took notice of Jesper's patch which describes a very
> similar issue and should help to resolve this:
> commit dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to
> reduce TX drops")
>
> But when repeating the above test with latest mainline, which includes this
> patch, and enabling qdisc via
> tc qdisc add dev in-lb root sfq perturb 10
> the Kernel crashed just after starting the second TCP stream (see output below).
>
> So I have two questions:
> - Is my understanding of the described issue correct and is Jesper's patch
> sufficient to solve this?
Hmm, yeah, this does sound likely.
> - Is my qdisc configuration to make use of this patch correct and the kernel
> crash is likely a bug?
>
> ------------[ cut here ]------------
> UBSAN: array-index-out-of-bounds in net/sched/sch_sfq.c:203:12
> index 65535 is out of range for type 'sfq_head [128]'
This (the 'index 65535') kinda screams "integer underflow". So certainly
looks like a kernel bug, yeah. Don't see any obvious reason why Jesper's
patch would trigger this; maybe Eric has an idea?
Does this happen with other qdiscs as well, or is it specific to sfq?
-Toke
Powered by blists - more mailing lists