[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <9da42688-bfaa-4364-8797-e9271f3bdaef@hetzner-cloud.de>
Date: Wed, 4 Jun 2025 17:33:36 +0200
From: Marcus Wichelmann <marcus.wichelmann@...zner-cloud.de>
To: Jesper Dangaard Brouer <hawk@...nel.org>, bpf@...r.kernel.org,
netdev@...r.kernel.org
Cc: Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
John Fastabend <john.fastabend@...il.com>,
Andrew Lunn <andrew+netdev@...n.ch>, "David S. Miller"
<davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Jamal Hadi Salim <jhs@...atatu.com>, Cong Wang <xiyou.wangcong@...il.com>,
Jiri Pirko <jiri@...nulli.us>, linux-kernel@...r.kernel.org
Subject: [BUG] veth: TX drops with NAPI enabled and crash in combination with
qdisc
Hi,
while experimenting with XDP_REDIRECT from a veth-pair to another interface, I
noticed that the veth-pair looses lots of packets when multiple TCP streams go
through it, resulting in stalling TCP connections and noticeable instabilities.
This doesn't seem to be an issue with just XDP but rather occurs whenever the
NAPI mode of the veth driver is active.
I managed to reproduce the same behavior just by bringing the veth-pair into
NAPI mode (see commit d3256efd8e8b ("veth: allow enabling NAPI even without
XDP")) and running multiple TCP streams through it using a network namespace.
Here is how I reproduced it:
ip netns add lb
ip link add dev to-lb type veth peer name in-lb netns lb
# Enable NAPI
ethtool -K to-lb gro on
ethtool -K to-lb tso off
ip netns exec lb ethtool -K in-lb gro on
ip netns exec lb ethtool -K in-lb tso off
ip link set dev to-lb up
ip -netns lb link set dev in-lb up
Then run a HTTP server inside the "lb" namespace that serves a large file:
fallocate -l 10G testfiles/10GB.bin
caddy file-server --root testfiles/
Download this file from within the root namespace multiple times in parallel:
curl http://[fe80::...%to-lb]/10GB.bin -o /dev/null
In my tests, I ran four parallel curls at the same time and after just a few
seconds, three of them stalled while the other one "won" over the full bandwidth
and completed the download.
This is probably a result of the veth's ptr_ring running full, causing many
packet drops on TX, and the TCP congestion control reacting to that.
In this context, I also took notice of Jesper's patch which describes a very
similar issue and should help to resolve this:
commit dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to
reduce TX drops")
But when repeating the above test with latest mainline, which includes this
patch, and enabling qdisc via
tc qdisc add dev in-lb root sfq perturb 10
the Kernel crashed just after starting the second TCP stream (see output below).
So I have two questions:
- Is my understanding of the described issue correct and is Jesper's patch
sufficient to solve this?
- Is my qdisc configuration to make use of this patch correct and the kernel
crash is likely a bug?
------------[ cut here ]------------
UBSAN: array-index-out-of-bounds in net/sched/sch_sfq.c:203:12
index 65535 is out of range for type 'sfq_head [128]'
CPU: 1 UID: 0 PID: 24 Comm: ksoftirqd/1 Not tainted 6.15.0+ #1 PREEMPT(voluntary)
Hardware name: GIGABYTE MP32-AR1-SW-HZ-001/MP32-AR1-00, BIOS F31n (SCP: 2.10.20220810) 09/30/2022
Call trace:
show_stack+0x24/0x50 (C)
dump_stack_lvl+0x80/0x140
dump_stack+0x1c/0x38
__ubsan_handle_out_of_bounds+0xd0/0x128
sfq_dequeue+0x37c/0x3e0 [sch_sfq]
__qdisc_run+0x90/0x760
net_tx_action+0x1b8/0x3b0
handle_softirqs+0x13c/0x418
run_ksoftirqd+0x9c/0xe8
smpboot_thread_fn+0x1c0/0x2e0
kthread+0x150/0x230
ret_from_fork+0x10/0x20
---[ end trace ]---
------------[ cut here ]------------
UBSAN: array-index-out-of-bounds in net/sched/sch_sfq.c:208:8
index 65535 is out of range for type 'sfq_head [128]'
CPU: 1 UID: 0 PID: 24 Comm: ksoftirqd/1 Not tainted 6.15.0+ #1 PREEMPT(voluntary)
Hardware name: GIGABYTE MP32-AR1-SW-HZ-001/MP32-AR1-00, BIOS F31n (SCP: 2.10.20220810) 09/30/2022
Call trace:
show_stack+0x24/0x50 (C)
dump_stack_lvl+0x80/0x140
dump_stack+0x1c/0x38
__ubsan_handle_out_of_bounds+0xd0/0x128
sfq_dequeue+0x394/0x3e0 [sch_sfq]
__qdisc_run+0x90/0x760
net_tx_action+0x1b8/0x3b0
handle_softirqs+0x13c/0x418
run_ksoftirqd+0x9c/0xe8
smpboot_thread_fn+0x1c0/0x2e0
kthread+0x150/0x230
ret_from_fork+0x10/0x20
---[ end trace ]---
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000005
Mem abort info:
ESR = 0x0000000096000004
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x04: level 0 translation fault
Data abort info:
ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
CM = 0, WnR = 0, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=000008002ad67000
[0000000000000005] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 0000000096000004 [#1] SMP
CPU: Ampere(R) Altra(R) Processor Q80-30 CPU @ 3.0GHz
# tc qdisc
qdisc sfq 8001: dev in-lb root refcnt 81 limit 127p quantum 1514b depth 127 divisor 1024 perturb 10sec
Thanks,
Marcus
Powered by blists - more mailing lists