lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAM0EoMmry8eGK6f7J8LCot==gftE4PUTQZ8gMD2ZwV8HqOC-qA@mail.gmail.com>
Date: Fri, 15 Aug 2025 04:45:20 -0400
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Victor Nogueira <victor@...atatu.com>
Cc: davem@...emloft.net, kuba@...nel.org, edumazet@...gle.com, 
	xiyou.wangcong@...il.com, jiri@...nulli.us, netdev@...r.kernel.org, 
	chia-yu.chang@...ia-bell-labs.com, koen.de_schepper@...ia-bell-labs.com
Subject: Re: [RFC PATCH net] net/sched: sch_dualpi2: Run prob update timer in
 softirq to avoid deadlock

On Wed, Aug 13, 2025 at 1:46 PM Victor Nogueira <victor@...atatu.com> wrote:
>
> When a user creates a dualpi2 qdisc it automatically sets a timer. This
> timer will run constantly and update the qdisc's probability field.
> The issue is that the timer acquires the qdisc root lock and runs in
> hardirq. The qdisc root lock is also acquired in dev.c whenever a packet
> arrives for this qdisc. Since the dualpi2 timer callback runs in hardirq,
> it may interrupt the packet processing running in softirq. If that happens
> and it runs on the same CPU, it will acquire the same lock and cause a
> deadlock. The following splat shows up when running a kernel compiled with
> lock debugging:
>
> [  +0.000224] WARNING: inconsistent lock state
> [  +0.000224] 6.16.0+ #10 Not tainted
> [  +0.000169] --------------------------------
> [  +0.000029] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
> [  +0.000000] ping/156 [HC0[0]:SC0[2]:HE1:SE0] takes:
> [  +0.000000] ffff897841242110 (&sch->root_lock_key){?.-.}-{3:3}, at: __dev_queue_xmit+0x86d/0x1140
> [  +0.000000] {IN-HARDIRQ-W} state was registered at:
> [  +0.000000]   lock_acquire.part.0+0xb6/0x220
> [  +0.000000]   _raw_spin_lock+0x31/0x80
> [  +0.000000]   dualpi2_timer+0x6f/0x270
> [  +0.000000]   __hrtimer_run_queues+0x1c5/0x360
> [  +0.000000]   hrtimer_interrupt+0x115/0x260
> [  +0.000000]   __sysvec_apic_timer_interrupt+0x6d/0x1a0
> [  +0.000000]   sysvec_apic_timer_interrupt+0x6e/0x80
> [  +0.000000]   asm_sysvec_apic_timer_interrupt+0x1a/0x20
> [  +0.000000]   pv_native_safe_halt+0xf/0x20
> [  +0.000000]   default_idle+0x9/0x10
> [  +0.000000]   default_idle_call+0x7e/0x1e0
> [  +0.000000]   do_idle+0x1e8/0x250
> [  +0.000000]   cpu_startup_entry+0x29/0x30
> [  +0.000000]   rest_init+0x151/0x160
> [  +0.000000]   start_kernel+0x6f3/0x700
> [  +0.000000]   x86_64_start_reservations+0x24/0x30
> [  +0.000000]   x86_64_start_kernel+0xc8/0xd0
> [  +0.000000]   common_startup_64+0x13e/0x148
> [  +0.000000] irq event stamp: 6884
> [  +0.000000] hardirqs last  enabled at (6883): [<ffffffffa75700b3>] neigh_resolve_output+0x223/0x270
> [  +0.000000] hardirqs last disabled at (6882): [<ffffffffa7570078>] neigh_resolve_output+0x1e8/0x270
> [  +0.000000] softirqs last  enabled at (6880): [<ffffffffa757006b>] neigh_resolve_output+0x1db/0x270
> [  +0.000000] softirqs last disabled at (6884): [<ffffffffa755b533>] __dev_queue_xmit+0x73/0x1140
> [  +0.000000]
>               other info that might help us debug this:
> [  +0.000000]  Possible unsafe locking scenario:
>
> [  +0.000000]        CPU0
> [  +0.000000]        ----
> [  +0.000000]   lock(&sch->root_lock_key);
> [  +0.000000]   <Interrupt>
> [  +0.000000]     lock(&sch->root_lock_key);
> [  +0.000000]
>                *** DEADLOCK ***
>
> [  +0.000000] 4 locks held by ping/156:
> [  +0.000000]  #0: ffff897842332e08 (sk_lock-AF_INET){+.+.}-{0:0}, at: raw_sendmsg+0x41e/0xf40
> [  +0.000000]  #1: ffffffffa816f880 (rcu_read_lock){....}-{1:3}, at: ip_output+0x2c/0x190
> [  +0.000000]  #2: ffffffffa816f880 (rcu_read_lock){....}-{1:3}, at: ip_finish_output2+0xad/0x950
> [  +0.000000]  #3: ffffffffa816f840 (rcu_read_lock_bh){....}-{1:3}, at: __dev_queue_xmit+0x73/0x1140
>
> I am able to reproduce it consistently when running the following:
>
> tc qdisc add dev lo handle 1: root dualpi2
> ping -f 127.0.0.1
>
> To fix it, make the timer run in softirq.
>
> Fixes: 320d031ad6e4 ("sched: Struct definition and parsing of dualpi2 qdisc")
> Signed-off-by: Victor Nogueira <victor@...atatu.com>
> ---
>  net/sched/sch_dualpi2.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/net/sched/sch_dualpi2.c b/net/sched/sch_dualpi2.c
> index 845375ebd4ea..4b975feb52b1 100644
> --- a/net/sched/sch_dualpi2.c
> +++ b/net/sched/sch_dualpi2.c
> @@ -927,7 +927,8 @@ static int dualpi2_init(struct Qdisc *sch, struct nlattr *opt,
>
>         q->sch = sch;
>         dualpi2_reset_default(sch);
> -       hrtimer_setup(&q->pi2_timer, dualpi2_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED);
> +       hrtimer_setup(&q->pi2_timer, dualpi2_timer, CLOCK_MONOTONIC,
> +                     HRTIMER_MODE_ABS_PINNED_SOFT);
>
>         if (opt && nla_len(opt)) {
>                 err = dualpi2_change(sch, opt, extack);
> @@ -937,7 +938,7 @@ static int dualpi2_init(struct Qdisc *sch, struct nlattr *opt,
>         }
>
>         hrtimer_start(&q->pi2_timer, next_pi2_timeout(q),
> -                     HRTIMER_MODE_ABS_PINNED);
> +                     HRTIMER_MODE_ABS_PINNED_SOFT);
>         return 0;
>  }
>

Reviewed-by: Jamal Hadi Salim <jhs@...atatu.com>

cheers,
jamal
> --
> 2.34.1
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ