[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <082A11A1-404B-4FFE-B6DC-64A543CA61AD@infradead.org>
Date: Mon, 16 Dec 2024 17:41:27 +0000
From: David Woodhouse <dwmw2@...radead.org>
To: Thomas Gleixner <tglx@...utronix.de>, Peter Zijlstra <peterz@...radead.org>
CC: Stefan Hajnoczi <stefanha@...hat.com>, Jason Wang <jasowang@...hat.com>,
"x86@...nel.org" <x86@...nel.org>, hpa <hpa@...or.com>,
dyoung <dyoung@...hat.com>, kexec <kexec@...ts.infradead.org>,
linux-ext4 <linux-ext4@...r.kernel.org>,
"Michael S. Tsirkin" <mst@...hat.com>,
Stefano Garzarella <sgarzare@...hat.com>, eperezma <eperezma@...hat.com>,
Paolo Bonzini <bonzini@...hat.com>, ming.lei@...hat.com,
Petr Mladek <pmladek@...e.com>, John Ogness <jogness@...utronix.de>
Subject: Re: [PATCH] sched: Prevent rescheduling when interrupts are disabled
On 16 December 2024 13:20:56 GMT, Thomas Gleixner <tglx@...utronix.de> wrote:
>David reported a warning observed while loop testing kexec jump:
>
> Interrupts enabled after irqrouter_resume+0x0/0x50
> WARNING: CPU: 0 PID: 560 at drivers/base/syscore.c:103 syscore_resume+0x18a/0x220
> kernel_kexec+0xf6/0x180
> __do_sys_reboot+0x206/0x250
> do_syscall_64+0x95/0x180
>
>The corresponding interrupt flag trace:
>
> hardirqs last enabled at (15573): [<ffffffffa8281b8e>] __up_console_sem+0x7e/0x90
> hardirqs last disabled at (15580): [<ffffffffa8281b73>] __up_console_sem+0x63/0x90
>
>That means __up_console_sem() was invoked with interrupts enabled. Further
>instrumentation revealed that in the interrupt disabled section of kexec
>jump one of the syscore_suspend() callbacks woke up a task, which set the
>NEED_RESCHED flag. A later callback in the resume path invoked
>cond_resched() which in turn led to the invocation of the scheduler:
>
> __cond_resched+0x21/0x60
> down_timeout+0x18/0x60
> acpi_os_wait_semaphore+0x4c/0x80
> acpi_ut_acquire_mutex+0x3d/0x100
> acpi_ns_get_node+0x27/0x60
> acpi_ns_evaluate+0x1cb/0x2d0
> acpi_rs_set_srs_method_data+0x156/0x190
> acpi_pci_link_set+0x11c/0x290
> irqrouter_resume+0x54/0x60
> syscore_resume+0x6a/0x200
> kernel_kexec+0x145/0x1c0
> __do_sys_reboot+0xeb/0x240
> do_syscall_64+0x95/0x180
>
>This is a long standing problem, which probably got more visible with
>the recent printk changes. Something does a task wakeup and the
>scheduler sets the NEED_RESCHED flag. cond_resched() sees it set and
>invokes schedule() from a completely bogus context. The scheduler
>enables interrupts after context switching, which causes the above
>warning at the end.
>
>Quite some of the code paths in syscore_suspend()/resume() can result in
>triggering a wakeup with the exactly same consequences. They might not
>have done so yet, but as they share a lot of code with normal operations
>it's just a question of time.
>
>The problem only affects the PREEMPT_NONE and PREEMPT_VOLUNTARY scheduling
>models. Full preemption is not affected as cond_resched() is disabled and
>the preemption check preemptible() takes the interrupt disabled flag into
>account.
>
>Cure the problem by adding a corresponding check into cond_resched().
>
>Reported-by: David Woodhouse <dwmw2@...radead.org>
>Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
>Tested-by: David Woodhouse <dwmw2@...radead.org>
>Cc: stable@...r.kernel.org
>Closes: https://lore.kernel.org/all/7717fe2ac0ce5f0a2c43fdab8b11f4483d54a2a4.camel@infradead.org
>---
> kernel/sched/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>--- a/kernel/sched/core.c
>+++ b/kernel/sched/core.c
>@@ -7276,7 +7276,7 @@ void rt_mutex_setprio(struct task_struct
> #if !defined(CONFIG_PREEMPTION) || defined(CONFIG_PREEMPT_DYNAMIC)
> int __sched __cond_resched(void)
> {
>- if (should_resched(0)) {
>+ if (should_resched(0) && !irqs_disabled()) {
> preempt_schedule_common();
> return 1;
> }
Thank you. Slight preference for dwmw@...zon.co.uk as the Reported-by and Tested-by addresses if it's not too late. I'm assuming you will handle this and don't want me to round it up with the kexec bits.
Powered by blists - more mailing lists