[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20250813174617.257553-1-victor@mojatatu.com>
Date: Wed, 13 Aug 2025 14:46:17 -0300
From: Victor Nogueira <victor@...atatu.com>
To: davem@...emloft.net,
kuba@...nel.org,
edumazet@...gle.com,
jhs@...atatu.com,
xiyou.wangcong@...il.com,
jiri@...nulli.us
Cc: netdev@...r.kernel.org,
chia-yu.chang@...ia-bell-labs.com,
koen.de_schepper@...ia-bell-labs.com
Subject: [RFC PATCH net] net/sched: sch_dualpi2: Run prob update timer in softirq to avoid deadlock
When a user creates a dualpi2 qdisc it automatically sets a timer. This
timer will run constantly and update the qdisc's probability field.
The issue is that the timer acquires the qdisc root lock and runs in
hardirq. The qdisc root lock is also acquired in dev.c whenever a packet
arrives for this qdisc. Since the dualpi2 timer callback runs in hardirq,
it may interrupt the packet processing running in softirq. If that happens
and it runs on the same CPU, it will acquire the same lock and cause a
deadlock. The following splat shows up when running a kernel compiled with
lock debugging:
[ +0.000224] WARNING: inconsistent lock state
[ +0.000224] 6.16.0+ #10 Not tainted
[ +0.000169] --------------------------------
[ +0.000029] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
[ +0.000000] ping/156 [HC0[0]:SC0[2]:HE1:SE0] takes:
[ +0.000000] ffff897841242110 (&sch->root_lock_key){?.-.}-{3:3}, at: __dev_queue_xmit+0x86d/0x1140
[ +0.000000] {IN-HARDIRQ-W} state was registered at:
[ +0.000000] lock_acquire.part.0+0xb6/0x220
[ +0.000000] _raw_spin_lock+0x31/0x80
[ +0.000000] dualpi2_timer+0x6f/0x270
[ +0.000000] __hrtimer_run_queues+0x1c5/0x360
[ +0.000000] hrtimer_interrupt+0x115/0x260
[ +0.000000] __sysvec_apic_timer_interrupt+0x6d/0x1a0
[ +0.000000] sysvec_apic_timer_interrupt+0x6e/0x80
[ +0.000000] asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ +0.000000] pv_native_safe_halt+0xf/0x20
[ +0.000000] default_idle+0x9/0x10
[ +0.000000] default_idle_call+0x7e/0x1e0
[ +0.000000] do_idle+0x1e8/0x250
[ +0.000000] cpu_startup_entry+0x29/0x30
[ +0.000000] rest_init+0x151/0x160
[ +0.000000] start_kernel+0x6f3/0x700
[ +0.000000] x86_64_start_reservations+0x24/0x30
[ +0.000000] x86_64_start_kernel+0xc8/0xd0
[ +0.000000] common_startup_64+0x13e/0x148
[ +0.000000] irq event stamp: 6884
[ +0.000000] hardirqs last enabled at (6883): [<ffffffffa75700b3>] neigh_resolve_output+0x223/0x270
[ +0.000000] hardirqs last disabled at (6882): [<ffffffffa7570078>] neigh_resolve_output+0x1e8/0x270
[ +0.000000] softirqs last enabled at (6880): [<ffffffffa757006b>] neigh_resolve_output+0x1db/0x270
[ +0.000000] softirqs last disabled at (6884): [<ffffffffa755b533>] __dev_queue_xmit+0x73/0x1140
[ +0.000000]
other info that might help us debug this:
[ +0.000000] Possible unsafe locking scenario:
[ +0.000000] CPU0
[ +0.000000] ----
[ +0.000000] lock(&sch->root_lock_key);
[ +0.000000] <Interrupt>
[ +0.000000] lock(&sch->root_lock_key);
[ +0.000000]
*** DEADLOCK ***
[ +0.000000] 4 locks held by ping/156:
[ +0.000000] #0: ffff897842332e08 (sk_lock-AF_INET){+.+.}-{0:0}, at: raw_sendmsg+0x41e/0xf40
[ +0.000000] #1: ffffffffa816f880 (rcu_read_lock){....}-{1:3}, at: ip_output+0x2c/0x190
[ +0.000000] #2: ffffffffa816f880 (rcu_read_lock){....}-{1:3}, at: ip_finish_output2+0xad/0x950
[ +0.000000] #3: ffffffffa816f840 (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit+0x73/0x1140
I am able to reproduce it consistently when running the following:
tc qdisc add dev lo handle 1: root dualpi2
ping -f 127.0.0.1
To fix it, make the timer run in softirq.
Fixes: 320d031ad6e4 ("sched: Struct definition and parsing of dualpi2 qdisc")
Signed-off-by: Victor Nogueira <victor@...atatu.com>
---
net/sched/sch_dualpi2.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/sched/sch_dualpi2.c b/net/sched/sch_dualpi2.c
index 845375ebd4ea..4b975feb52b1 100644
--- a/net/sched/sch_dualpi2.c
+++ b/net/sched/sch_dualpi2.c
@@ -927,7 +927,8 @@ static int dualpi2_init(struct Qdisc *sch, struct nlattr *opt,
q->sch = sch;
dualpi2_reset_default(sch);
- hrtimer_setup(&q->pi2_timer, dualpi2_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED);
+ hrtimer_setup(&q->pi2_timer, dualpi2_timer, CLOCK_MONOTONIC,
+ HRTIMER_MODE_ABS_PINNED_SOFT);
if (opt && nla_len(opt)) {
err = dualpi2_change(sch, opt, extack);
@@ -937,7 +938,7 @@ static int dualpi2_init(struct Qdisc *sch, struct nlattr *opt,
}
hrtimer_start(&q->pi2_timer, next_pi2_timeout(q),
- HRTIMER_MODE_ABS_PINNED);
+ HRTIMER_MODE_ABS_PINNED_SOFT);
return 0;
}
--
2.34.1
Powered by blists - more mailing lists