[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAM0EoMmry8eGK6f7J8LCot==gftE4PUTQZ8gMD2ZwV8HqOC-qA@mail.gmail.com>
Date: Fri, 15 Aug 2025 04:45:20 -0400
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Victor Nogueira <victor@...atatu.com>
Cc: davem@...emloft.net, kuba@...nel.org, edumazet@...gle.com,
xiyou.wangcong@...il.com, jiri@...nulli.us, netdev@...r.kernel.org,
chia-yu.chang@...ia-bell-labs.com, koen.de_schepper@...ia-bell-labs.com
Subject: Re: [RFC PATCH net] net/sched: sch_dualpi2: Run prob update timer in
softirq to avoid deadlock
On Wed, Aug 13, 2025 at 1:46 PM Victor Nogueira <victor@...atatu.com> wrote:
>
> When a user creates a dualpi2 qdisc it automatically sets a timer. This
> timer will run constantly and update the qdisc's probability field.
> The issue is that the timer acquires the qdisc root lock and runs in
> hardirq. The qdisc root lock is also acquired in dev.c whenever a packet
> arrives for this qdisc. Since the dualpi2 timer callback runs in hardirq,
> it may interrupt the packet processing running in softirq. If that happens
> and it runs on the same CPU, it will acquire the same lock and cause a
> deadlock. The following splat shows up when running a kernel compiled with
> lock debugging:
>
> [ +0.000224] WARNING: inconsistent lock state
> [ +0.000224] 6.16.0+ #10 Not tainted
> [ +0.000169] --------------------------------
> [ +0.000029] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> [ +0.000000] ping/156 [HC0[0]:SC0[2]:HE1:SE0] takes:
> [ +0.000000] ffff897841242110 (&sch->root_lock_key){?.-.}-{3:3}, at: __dev_queue_xmit+0x86d/0x1140
> [ +0.000000] {IN-HARDIRQ-W} state was registered at:
> [ +0.000000] lock_acquire.part.0+0xb6/0x220
> [ +0.000000] _raw_spin_lock+0x31/0x80
> [ +0.000000] dualpi2_timer+0x6f/0x270
> [ +0.000000] __hrtimer_run_queues+0x1c5/0x360
> [ +0.000000] hrtimer_interrupt+0x115/0x260
> [ +0.000000] __sysvec_apic_timer_interrupt+0x6d/0x1a0
> [ +0.000000] sysvec_apic_timer_interrupt+0x6e/0x80
> [ +0.000000] asm_sysvec_apic_timer_interrupt+0x1a/0x20
> [ +0.000000] pv_native_safe_halt+0xf/0x20
> [ +0.000000] default_idle+0x9/0x10
> [ +0.000000] default_idle_call+0x7e/0x1e0
> [ +0.000000] do_idle+0x1e8/0x250
> [ +0.000000] cpu_startup_entry+0x29/0x30
> [ +0.000000] rest_init+0x151/0x160
> [ +0.000000] start_kernel+0x6f3/0x700
> [ +0.000000] x86_64_start_reservations+0x24/0x30
> [ +0.000000] x86_64_start_kernel+0xc8/0xd0
> [ +0.000000] common_startup_64+0x13e/0x148
> [ +0.000000] irq event stamp: 6884
> [ +0.000000] hardirqs last enabled at (6883): [<ffffffffa75700b3>] neigh_resolve_output+0x223/0x270
> [ +0.000000] hardirqs last disabled at (6882): [<ffffffffa7570078>] neigh_resolve_output+0x1e8/0x270
> [ +0.000000] softirqs last enabled at (6880): [<ffffffffa757006b>] neigh_resolve_output+0x1db/0x270
> [ +0.000000] softirqs last disabled at (6884): [<ffffffffa755b533>] __dev_queue_xmit+0x73/0x1140
> [ +0.000000]
> other info that might help us debug this:
> [ +0.000000] Possible unsafe locking scenario:
>
> [ +0.000000] CPU0
> [ +0.000000] ----
> [ +0.000000] lock(&sch->root_lock_key);
> [ +0.000000] <Interrupt>
> [ +0.000000] lock(&sch->root_lock_key);
> [ +0.000000]
> *** DEADLOCK ***
>
> [ +0.000000] 4 locks held by ping/156:
> [ +0.000000] #0: ffff897842332e08 (sk_lock-AF_INET){+.+.}-{0:0}, at: raw_sendmsg+0x41e/0xf40
> [ +0.000000] #1: ffffffffa816f880 (rcu_read_lock){....}-{1:3}, at: ip_output+0x2c/0x190
> [ +0.000000] #2: ffffffffa816f880 (rcu_read_lock){....}-{1:3}, at: ip_finish_output2+0xad/0x950
> [ +0.000000] #3: ffffffffa816f840 (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit+0x73/0x1140
>
> I am able to reproduce it consistently when running the following:
>
> tc qdisc add dev lo handle 1: root dualpi2
> ping -f 127.0.0.1
>
> To fix it, make the timer run in softirq.
>
> Fixes: 320d031ad6e4 ("sched: Struct definition and parsing of dualpi2 qdisc")
> Signed-off-by: Victor Nogueira <victor@...atatu.com>
> ---
> net/sched/sch_dualpi2.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/net/sched/sch_dualpi2.c b/net/sched/sch_dualpi2.c
> index 845375ebd4ea..4b975feb52b1 100644
> --- a/net/sched/sch_dualpi2.c
> +++ b/net/sched/sch_dualpi2.c
> @@ -927,7 +927,8 @@ static int dualpi2_init(struct Qdisc *sch, struct nlattr *opt,
>
> q->sch = sch;
> dualpi2_reset_default(sch);
> - hrtimer_setup(&q->pi2_timer, dualpi2_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED);
> + hrtimer_setup(&q->pi2_timer, dualpi2_timer, CLOCK_MONOTONIC,
> + HRTIMER_MODE_ABS_PINNED_SOFT);
>
> if (opt && nla_len(opt)) {
> err = dualpi2_change(sch, opt, extack);
> @@ -937,7 +938,7 @@ static int dualpi2_init(struct Qdisc *sch, struct nlattr *opt,
> }
>
> hrtimer_start(&q->pi2_timer, next_pi2_timeout(q),
> - HRTIMER_MODE_ABS_PINNED);
> + HRTIMER_MODE_ABS_PINNED_SOFT);
> return 0;
> }
>
Reviewed-by: Jamal Hadi Salim <jhs@...atatu.com>
cheers,
jamal
> --
> 2.34.1
>
Powered by blists - more mailing lists