[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b2b0bceacbd080e13b3aa7ad05569a787df4646d.camel@mediatek.com>
Date: Tue, 29 Oct 2024 02:20:51 +0000
From: Cheng-Jui Wang (王正睿)
<Cheng-Jui.Wang@...iatek.com>
To: "frederic@...nel.org" <frederic@...nel.org>, "paulmck@...nel.org"
<paulmck@...nel.org>, "rcu@...r.kernel.org" <rcu@...r.kernel.org>
CC: wsd_upstream <wsd_upstream@...iatek.com>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "kernel-team@...a.com"
<kernel-team@...a.com>, Bobule Chang (張弘義)
<bobule.chang@...iatek.com>, "rostedt@...dmis.org" <rostedt@...dmis.org>,
Cheng-Jui Wang (王正睿)
<Cheng-Jui.Wang@...iatek.com>, "joel@...lfernandes.org"
<joel@...lfernandes.org>
Subject: Re: [PATCH v3 rcu 3/3] rcu: Finer-grained grace-period-end checks in
rcu_dump_cpu_stacks()
On Mon, 2024-10-28 at 17:22 -0700, Paul E. McKenney wrote:
> The result is that the current leaf rcu_node structure's ->lock is
> acquired only if a stack backtrace might be needed from the current CPU,
> and is held across only that CPU's backtrace. As a result, if there are
After upgrading our device to kernel-6.11, we encountered a lockup
scenario under stall warning.
I had prepared a patch to submit, but I noticed that this series has
already addressed some issues, though it hasn't been merged into the
mainline yet. So, I decided to reply to this series for discussion on
how to fix it before pushing. Here is the lockup scenario We
encountered:
Devices: arm64 with only 8 cores
One CPU holds rnp->lock in rcu_dump_cpu_stack() while trying to dump
other CPUs, but it waits for the corresponding CPU to dump backtrace,
with a 10-second timeout.
__delay()
__const_udelay()
nmi_trigger_cpumask_backtrace()
arch_trigger_cpumask_backtrace()
trigger_single_cpu_backtrace()
dump_cpu_task()
rcu_dump_cpu_stacks() <- holding rnp->lock
print_other_cpu_stall()
check_cpu_stall()
rcu_pending()
rcu_sched_clock_irq()
update_process_times()
However, the other 7 CPUs are waiting for rnp->lock on the path to
report qs.
queued_spin_lock_slowpath()
queued_spin_lock()
do_raw_spin_lock()
__raw_spin_lock_irqsave()
_raw_spin_lock_irqsave()
rcu_report_qs_rdp()
rcu_check_quiescent_state()
rcu_core()
rcu_core_si()
handle_softirqs()
__do_softirq()
____do_softirq()
call_on_irq_stack()
Since the arm64 architecture uses IPI instead of true NMI to implement
arch_trigger_cpumask_backtrace(), spin_lock_irqsave disables
interrupts, which is enough to block this IPI request.
Therefore, if other CPUs start waiting for the lock before receiving
the IPI, a semi-deadlock scenario like the following occurs:
CPU0 CPU1 CPU2
----- ----- -----
lock_irqsave(rnp->lock)
lock_irqsave(rnp->lock)
<can't receive IPI>
<send ipi to CPU 1>
<wait CPU 1 for 10s>
lock_irqsave(rnp->lock)
<can't receive IPI>
<send ipi to CPU 2>
<wait CPU 2 for 10s>
...
In our scenario, with 7 CPUs to dump, the lockup takes nearly 70
seconds, causing subsequent useful logs to be unable to print, leading
to a watchdog timeout and system reboot.
This series of changes re-acquires the lock after each dump,
significantly reducing lock-holding time. However, since it still holds
the lock while dumping CPU backtrace, there's still a chance for two
CPUs to wait for each other for 10 seconds, which is still too long.
So, I would like to ask if it's necessary to dump backtrace within the
spinlock section?
If not, especially now that lockless checks are possible, maybe it can
be changed as follows?
- if (!(data_race(rnp->qsmask) & leaf_node_cpu_bit(rnp, cpu)))
- continue;
- raw_spin_lock_irqsave_rcu_node(rnp, flags);
- if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu)) {
+ if (data_race(rnp->qsmask) & leaf_node_cpu_bit(rnp, cpu)) {
if (cpu_is_offline(cpu))
pr_err("Offline CPU %d blocking current GP.\n", cpu);
else
dump_cpu_task(cpu);
}
}
- raw_spin_unlock_irqrestore_rcu_node(rnp,
flags);
Or should this be considered an arm64 issue, and they should switch to
true NMI, otherwise, they shouldn't use
nmi_trigger_cpumask_backtrace()?
Powered by blists - more mailing lists