lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89i+7crgdpf-UXDpTNdWfei95+JHyMD_dBD8efTbLBnvZUQ@mail.gmail.com>
Date: Thu, 5 Jun 2025 09:46:12 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: Toke Høiland-Jørgensen <toke@...hat.com>
Cc: Marcus Wichelmann <marcus.wichelmann@...zner-cloud.de>, 
	Jesper Dangaard Brouer <hawk@...nel.org>, bpf@...r.kernel.org, netdev@...r.kernel.org, 
	Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, 
	John Fastabend <john.fastabend@...il.com>, Andrew Lunn <andrew+netdev@...n.ch>, 
	"David S. Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, 
	Jamal Hadi Salim <jhs@...atatu.com>, Cong Wang <xiyou.wangcong@...il.com>, 
	Jiri Pirko <jiri@...nulli.us>, linux-kernel@...r.kernel.org
Subject: Re: [BUG] veth: TX drops with NAPI enabled and crash in combination
 with qdisc

On Thu, Jun 5, 2025 at 9:15 AM Toke Høiland-Jørgensen <toke@...hat.com> wrote:
>
> Marcus Wichelmann <marcus.wichelmann@...zner-cloud.de> writes:
>
> > Hi,
> >
> > while experimenting with XDP_REDIRECT from a veth-pair to another interface, I
> > noticed that the veth-pair looses lots of packets when multiple TCP streams go
> > through it, resulting in stalling TCP connections and noticeable instabilities.
> >
> > This doesn't seem to be an issue with just XDP but rather occurs whenever the
> > NAPI mode of the veth driver is active.
> > I managed to reproduce the same behavior just by bringing the veth-pair into
> > NAPI mode (see commit d3256efd8e8b ("veth: allow enabling NAPI even without
> > XDP")) and running multiple TCP streams through it using a network namespace.
> >
> > Here is how I reproduced it:
> >
> >   ip netns add lb
> >   ip link add dev to-lb type veth peer name in-lb netns lb
> >
> >   # Enable NAPI
> >   ethtool -K to-lb gro on
> >   ethtool -K to-lb tso off
> >   ip netns exec lb ethtool -K in-lb gro on
> >   ip netns exec lb ethtool -K in-lb tso off
> >
> >   ip link set dev to-lb up
> >   ip -netns lb link set dev in-lb up
> >
> > Then run a HTTP server inside the "lb" namespace that serves a large file:
> >
> >   fallocate -l 10G testfiles/10GB.bin
> >   caddy file-server --root testfiles/
> >
> > Download this file from within the root namespace multiple times in parallel:
> >
> >   curl http://[fe80::...%to-lb]/10GB.bin -o /dev/null
> >
> > In my tests, I ran four parallel curls at the same time and after just a few
> > seconds, three of them stalled while the other one "won" over the full bandwidth
> > and completed the download.
> >
> > This is probably a result of the veth's ptr_ring running full, causing many
> > packet drops on TX, and the TCP congestion control reacting to that.
> >
> > In this context, I also took notice of Jesper's patch which describes a very
> > similar issue and should help to resolve this:
> >   commit dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to
> >   reduce TX drops")
> >
> > But when repeating the above test with latest mainline, which includes this
> > patch, and enabling qdisc via
> >   tc qdisc add dev in-lb root sfq perturb 10
> > the Kernel crashed just after starting the second TCP stream (see output below).
> >
> > So I have two questions:
> > - Is my understanding of the described issue correct and is Jesper's patch
> >   sufficient to solve this?
>
> Hmm, yeah, this does sound likely.
>
> > - Is my qdisc configuration to make use of this patch correct and the kernel
> >   crash is likely a bug?
> >
> > ------------[ cut here ]------------
> > UBSAN: array-index-out-of-bounds in net/sched/sch_sfq.c:203:12
> > index 65535 is out of range for type 'sfq_head [128]'
>
> This (the 'index 65535') kinda screams "integer underflow". So certainly
> looks like a kernel bug, yeah. Don't see any obvious reason why Jesper's
> patch would trigger this; maybe Eric has an idea?
>
> Does this happen with other qdiscs as well, or is it specific to sfq?

This seems like a bug in sfq, we already had recent fixes in it, and
other fixes in net/sched vs qdisc_tree_reduce_backlog()

It is possible qdisc_pkt_len() could be wrong in this use case (TSO off ?)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ