lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
 <TY2PR01MB46816118EEA6A938320A3421F598A@TY2PR01MB4681.jpnprd01.prod.outlook.com>
Date: Fri, 23 May 2025 07:07:45 +0000
From: fengtian guo <fengtian_guo@...mail.com>
To: Steven Rostedt <rostedt@...dmis.org>
CC: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
	Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot
	<vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Sebastian
 Andrzej Siewior <bigeasy@...utronix.de>
Subject: Re: [PATCH RT] Possible spinlock deadlock in kernel/sched/rt.c under
 high load


Hi Steve you comment is right and Thanks very much

1)5.10 kernel not init with IRQ_WORK_INIT_HARD and this fix on 6.6
So the bug exist from 5.10 to 6.x ? some version of kernel

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 10b386164..a3014353f 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -513,7 +513,7 @@ static int init_rootdomain(struct root_domain *rd)
 #ifdef HAVE_RT_PUSH_IPI
        rd->rto_cpu = -1;
        raw_spin_lock_init(&rd->rto_lock);
-       init_irq_work(&rd->rto_push_work, rto_push_irq_work_func);
+       rd->rto_push_work = IRQ_WORK_INIT_HARD(rto_push_irq_work_func);
        atomic_or(IRQ_WORK_HARD_IRQ, &rd->rto_push_work.flags);
 #endif

1)Without fix and the callstack is following and rto_push_irq_work_func run on irq_work thread context and irq is not disabled

    [871.101517][ C12] rto_push_irq_work_func+0x180/0x1ec
    [871.101519][ C12] irq_work_single+0x38/0xb4
    [871.101522][ C12] irq_work_run_list+0x48/0x60
    [871.101525][ C12] run_irq_workd+0x30/0x40
    [871.101527][ C12] smpboot_thread_fn+0x278/0x300
    [871.101530][ C12] kthread+0x170/0x19c
    [871.101532][ C12] ret_from_fork+0x10/0x18

2)  after the patch the call stack rto_push_irq_work_func  run on ipi interrupt context .
The irq_work is queued to raised list and notify the target cpu with ipi interrupt and the target cpu process the irq_work event

[Thu Jan 1 08:52:55 1970] rto_push_irq_work_func+0x23c/0x240
[Thu Jan 1 08:52:55 1970] irq_work_single+0x38/0xb4
[Thu Jan 1 08:52:55 1970] flush_smp_call_function_queue+0x14c/0x25c
[Thu Jan 1 08:52:55 1970] generic_smp_call_function_single_interrupt+0x1c/0x30
[Thu Jan 1 08:52:55 1970] do_handle_IPI+0x74/0x36c
[Thu Jan 1 08:52:55 1970] ipi_handler+0x24/0x34
[Thu Jan 1 08:52:55 1970] handle_percpu_devid_fasteoi_ipi+0xa8/0x204
[Thu Jan 1 08:52:55 1970] __handle_domain_irq+0xb8/0x13c
[Thu Jan 1 08:52:55 1970] gic_handle_irq+0x78/0x2d0
[Thu Jan 1 08:52:55 1970] el1_irq+0xb8/0x180

3) the attachment is patch

fangchunguo@...ntu:~/fengtian/linux_git$ git show da6ff09943491
commit da6ff09943491819e077b94c284bf0a6b751c9b8
Author: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Date:   Wed Oct 6 13:18:49 2021 +0200

    sched/rt: Annotate the RT balancing logic irqwork as IRQ_WORK_HARD_IRQ

    The push-IPI logic for RT tasks expects to be invoked from hardirq
    context. One reason is that a RT task on the remote CPU would block the
    softirq processing on PREEMPT_RT and so avoid pulling / balancing the RT
    tasks as intended.

    Annotate root_domain::rto_push_work as IRQ_WORK_HARD_IRQ.

    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
    Link: https://lkml.kernel.org/r/20211006111852.1514359-2-bigeasy@linutronix.de

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index c1729f9a715f..e81246787560 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -526,7 +526,7 @@ static int init_rootdomain(struct root_domain *rd)
 #ifdef HAVE_RT_PUSH_IPI
        rd->rto_cpu = -1;
        raw_spin_lock_init(&rd->rto_lock);
-       init_irq_work(&rd->rto_push_work, rto_push_irq_work_func);
+       rd->rto_push_work = IRQ_WORK_INIT_HARD(rto_push_irq_work_func);
 #endif

        rd->visit_gen = 0;


________________________________
From: Steven Rostedt <rostedt@...dmis.org>
Sent: Wednesday, May 21, 2025 9:52 PM
To: fengtian guo <fengtian_guo@...mail.com>
Cc: Ingo Molnar <mingo@...hat.com>; Peter Zijlstra <peterz@...radead.org>; Juri Lelli <juri.lelli@...hat.com>; Vincent Guittot <vincent.guittot@...aro.org>; Dietmar Eggemann <dietmar.eggemann@....com>; Ben Segall <bsegall@...gle.com>; Mel Gorman <mgorman@...e.de>; linux-kernel@...r.kernel.org <linux-kernel@...r.kernel.org>; Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Subject: Re: [PATCH RT] Possible spinlock deadlock in kernel/sched/rt.c under high load

On Wed, 21 May 2025 10:35:53 +0000
fengtian guo <fengtian_guo@...mail.com> wrote:

> hardward: On arm64 with 32 cores
>
> First Deadlock Root Cause Analysis
> The initial deadlock occurs due to
> unprotected spinlock access between
> an IRQ work thread and a hardware interrupt on the same CPU
> Here is the critical path:
> Deadlock Sequence
> IRQ Work Thread Context (RT priority):
>
> irq_work → rto_push_irq_work_func → raw_spin_lock(&rq->lock) → push_rt_task
> The rto_push_irq_work_func thread acquires rq->lock without disabling interrupts

rto_push_irq_work_func() must be called with interrupts disabled. If it is
not, then that's a bug in the implementation of irq_work!

>
> Hardware Interrupt Context (Clock timer):
> hrtimer_interrupt → __hrtimer_run_queues → _run_hrtimer → hrtimer_wakeup →
> try_to_wake_up → ttwu_queue → raw_spin_lock(&rq->lock)
>
> The clock interrupt preempts the IRQ work thread while it holds rq->lock.
> The interrupt handler attempts to acquire the same rq->lock via ttwu_queue
> , causing a double-lock deadlock.



> Signed-off-by: Fengtian Guo <fengtian_guo@...mail.com>
> ---
>  kernel/sched/rt.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 5dc1ee8dc..52a2e7bce 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -2131,6 +2131,7 @@ void rto_push_irq_work_func(struct irq_work *work)
>                 container_of(work, struct root_domain, rto_push_work);
>         struct rq *rq;
>         int cpu;
> +       unsigned long flags;
>
>         rq = this_rq();
>
> @@ -2139,10 +2140,10 @@ void rto_push_irq_work_func(struct irq_work *work)
>          * When it gets updated, a check is made if a push is possible.
>          */
>         if (has_pushable_tasks(rq)) {
> -               raw_spin_lock(&rq->lock);
> +               raw_spin_lock_irqsave(&rq->lock, flags);
>                 while (push_rt_task(rq, true))
>                         ;
> -               raw_spin_unlock(&rq->lock);
> +               raw_spin_unlock_irqrestore(&rq->lock, flags);

interrupts should *NEVER* be enabled here!

>         }
>
>         raw_spin_lock(&rd->rto_lock);
> --

In kernel/sched/topology.c we have:

        rd->rto_push_work = IRQ_WORK_INIT_HARD(rto_push_irq_work_func);

That IRQ_WORK_INIT_HARD() means that this function must always be called
from hard interrupt context (or interrupts disabled). Even when PREEMPT_RT
is enabled.

If the irq_work is being called without interrupts disabled, there's a bug
somewhere else.

NACK on this patch, because its fixing a symptom of the bug and not the bug
itself.

The question is, how did this get called as a normal irq_work and not one
that was marked as "HARD"?

-- Steve

Content of type "text/html" skipped

Download attachment "0001-PATCH-RT-Possible-spinlock-deadlock-in-kernel-sched-.patch" of type "application/octet-stream" (7768 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ