[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89i+7crgdpf-UXDpTNdWfei95+JHyMD_dBD8efTbLBnvZUQ@mail.gmail.com>
Date: Thu, 5 Jun 2025 09:46:12 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: Toke Høiland-Jørgensen <toke@...hat.com>
Cc: Marcus Wichelmann <marcus.wichelmann@...zner-cloud.de>,
Jesper Dangaard Brouer <hawk@...nel.org>, bpf@...r.kernel.org, netdev@...r.kernel.org,
Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
John Fastabend <john.fastabend@...il.com>, Andrew Lunn <andrew+netdev@...n.ch>,
"David S. Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Jamal Hadi Salim <jhs@...atatu.com>, Cong Wang <xiyou.wangcong@...il.com>,
Jiri Pirko <jiri@...nulli.us>, linux-kernel@...r.kernel.org
Subject: Re: [BUG] veth: TX drops with NAPI enabled and crash in combination
with qdisc
On Thu, Jun 5, 2025 at 9:15 AM Toke Høiland-Jørgensen <toke@...hat.com> wrote:
>
> Marcus Wichelmann <marcus.wichelmann@...zner-cloud.de> writes:
>
> > Hi,
> >
> > while experimenting with XDP_REDIRECT from a veth-pair to another interface, I
> > noticed that the veth-pair looses lots of packets when multiple TCP streams go
> > through it, resulting in stalling TCP connections and noticeable instabilities.
> >
> > This doesn't seem to be an issue with just XDP but rather occurs whenever the
> > NAPI mode of the veth driver is active.
> > I managed to reproduce the same behavior just by bringing the veth-pair into
> > NAPI mode (see commit d3256efd8e8b ("veth: allow enabling NAPI even without
> > XDP")) and running multiple TCP streams through it using a network namespace.
> >
> > Here is how I reproduced it:
> >
> > ip netns add lb
> > ip link add dev to-lb type veth peer name in-lb netns lb
> >
> > # Enable NAPI
> > ethtool -K to-lb gro on
> > ethtool -K to-lb tso off
> > ip netns exec lb ethtool -K in-lb gro on
> > ip netns exec lb ethtool -K in-lb tso off
> >
> > ip link set dev to-lb up
> > ip -netns lb link set dev in-lb up
> >
> > Then run a HTTP server inside the "lb" namespace that serves a large file:
> >
> > fallocate -l 10G testfiles/10GB.bin
> > caddy file-server --root testfiles/
> >
> > Download this file from within the root namespace multiple times in parallel:
> >
> > curl http://[fe80::...%to-lb]/10GB.bin -o /dev/null
> >
> > In my tests, I ran four parallel curls at the same time and after just a few
> > seconds, three of them stalled while the other one "won" over the full bandwidth
> > and completed the download.
> >
> > This is probably a result of the veth's ptr_ring running full, causing many
> > packet drops on TX, and the TCP congestion control reacting to that.
> >
> > In this context, I also took notice of Jesper's patch which describes a very
> > similar issue and should help to resolve this:
> > commit dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to
> > reduce TX drops")
> >
> > But when repeating the above test with latest mainline, which includes this
> > patch, and enabling qdisc via
> > tc qdisc add dev in-lb root sfq perturb 10
> > the Kernel crashed just after starting the second TCP stream (see output below).
> >
> > So I have two questions:
> > - Is my understanding of the described issue correct and is Jesper's patch
> > sufficient to solve this?
>
> Hmm, yeah, this does sound likely.
>
> > - Is my qdisc configuration to make use of this patch correct and the kernel
> > crash is likely a bug?
> >
> > ------------[ cut here ]------------
> > UBSAN: array-index-out-of-bounds in net/sched/sch_sfq.c:203:12
> > index 65535 is out of range for type 'sfq_head [128]'
>
> This (the 'index 65535') kinda screams "integer underflow". So certainly
> looks like a kernel bug, yeah. Don't see any obvious reason why Jesper's
> patch would trigger this; maybe Eric has an idea?
>
> Does this happen with other qdiscs as well, or is it specific to sfq?
This seems like a bug in sfq, we already had recent fixes in it, and
other fixes in net/sched vs qdisc_tree_reduce_backlog()
It is possible qdisc_pkt_len() could be wrong in this use case (TSO off ?)
Powered by blists - more mailing lists