[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ff361300-a390-651d-8316-1f4e8d390af3@samsung.com>
Date: Fri, 22 Oct 2021 15:46:29 +0200
From: Marek Szyprowski <m.szyprowski@...sung.com>
To: Peter Zijlstra <peterz@...radead.org>, gor@...ux.ibm.com,
jpoimboe@...hat.com, jikos@...nel.org, mbenes@...e.cz,
pmladek@...e.com, mingo@...nel.org
Cc: linux-kernel@...r.kernel.org, joe.lawrence@...hat.com,
fweisbec@...il.com, tglx@...utronix.de, hca@...ux.ibm.com,
svens@...ux.ibm.com, sumanthk@...ux.ibm.com,
live-patching@...r.kernel.org, paulmck@...nel.org,
rostedt@...dmis.org, x86@...nel.org
Subject: Re: [PATCH v2 04/11] sched: Simplify wake_up_*idle*()
Hi
On 29.09.2021 17:17, Peter Zijlstra wrote:
> Simplify and make wake_up_if_idle() more robust, also don't iterate
> the whole machine with preempt_disable() in it's caller:
> wake_up_all_idle_cpus().
>
> This prepares for another wake_up_if_idle() user that needs a full
> do_idle() cycle.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
This patch landed recently in linux-next as commit 8850cb663b5c ("sched:
Simplify wake_up_*idle*()"). It causes the following warning on the
arm64 virt machine under qemu during the system suspend/resume cycle:
--->8---
printk: Suspending console(s) (use no_console_suspend to debug)
============================================
WARNING: possible recursive locking detected
5.15.0-rc6-next-20211022 #10905 Not tainted
--------------------------------------------
rtcwake/1326 is trying to acquire lock:
ffffd4e9192e8130 (cpu_hotplug_lock){++++}-{0:0}, at:
wake_up_all_idle_cpus+0x24/0x98
but task is already holding lock:
ffffd4e9192e8130 (cpu_hotplug_lock){++++}-{0:0}, at:
suspend_devices_and_enter+0x740/0x9f0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(cpu_hotplug_lock);
lock(cpu_hotplug_lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
5 locks held by rtcwake/1326:
#0: ffff54ad86a78438 (sb_writers#7){.+.+}-{0:0}, at: ksys_write+0x64/0xf0
#1: ffff54ad84094a88 (&of->mutex){+.+.}-{3:3}, at:
kernfs_fop_write_iter+0xf4/0x1a8
#2: ffff54ad83b17a88 (kn->active#43){.+.+}-{0:0}, at:
kernfs_fop_write_iter+0xfc/0x1a8
#3: ffffd4e9192efab0 (system_transition_mutex){+.+.}-{3:3}, at:
pm_suspend+0x214/0x3d0
#4: ffffd4e9192e8130 (cpu_hotplug_lock){++++}-{0:0}, at:
suspend_devices_and_enter+0x740/0x9f0
stack backtrace:
CPU: 0 PID: 1326 Comm: rtcwake Not tainted 5.15.0-rc6-next-20211022 #10905
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x1d0
show_stack+0x14/0x20
dump_stack_lvl+0x88/0xb0
dump_stack+0x14/0x2c
__lock_acquire+0x171c/0x17b8
lock_acquire+0x234/0x378
cpus_read_lock+0x5c/0x150
wake_up_all_idle_cpus+0x24/0x98
suspend_devices_and_enter+0x748/0x9f0
pm_suspend+0x2b0/0x3d0
state_store+0x84/0x108
kobj_attr_store+0x14/0x28
sysfs_kf_write+0x60/0x70
kernfs_fop_write_iter+0x124/0x1a8
new_sync_write+0xe8/0x1b0
vfs_write+0x1d0/0x408
ksys_write+0x64/0xf0
__arm64_sys_write+0x14/0x20
invoke_syscall+0x40/0xf8
el0_svc_common.constprop.3+0x8c/0x120
do_el0_svc_compat+0x18/0x48
el0_svc_compat+0x48/0x100
el0t_32_sync_handler+0xec/0x140
el0t_32_sync+0x170/0x174
OOM killer enabled.
Restarting tasks ... done.
PM: suspend exit
--->8---
Let me know if there is anything I can help to debug and fix this issue.
> ---
> kernel/sched/core.c | 14 +++++---------
> kernel/smp.c | 6 +++---
> 2 files changed, 8 insertions(+), 12 deletions(-)
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3691,15 +3691,11 @@ void wake_up_if_idle(int cpu)
> if (!is_idle_task(rcu_dereference(rq->curr)))
> goto out;
>
> - if (set_nr_if_polling(rq->idle)) {
> - trace_sched_wake_idle_without_ipi(cpu);
> - } else {
> - rq_lock_irqsave(rq, &rf);
> - if (is_idle_task(rq->curr))
> - smp_send_reschedule(cpu);
> - /* Else CPU is not idle, do nothing here: */
> - rq_unlock_irqrestore(rq, &rf);
> - }
> + rq_lock_irqsave(rq, &rf);
> + if (is_idle_task(rq->curr))
> + resched_curr(rq);
> + /* Else CPU is not idle, do nothing here: */
> + rq_unlock_irqrestore(rq, &rf);
>
> out:
> rcu_read_unlock();
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -1170,14 +1170,14 @@ void wake_up_all_idle_cpus(void)
> {
> int cpu;
>
> - preempt_disable();
> + cpus_read_lock();
> for_each_online_cpu(cpu) {
> - if (cpu == smp_processor_id())
> + if (cpu == raw_smp_processor_id())
> continue;
>
> wake_up_if_idle(cpu);
> }
> - preempt_enable();
> + cpus_read_unlock();
> }
> EXPORT_SYMBOL_GPL(wake_up_all_idle_cpus);
>
>
>
>
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
Powered by blists - more mailing lists