[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251022121345.23496-1-piliu@redhat.com>
Date: Wed, 22 Oct 2025 20:13:42 +0800
From: Pingfan Liu <piliu@...hat.com>
To: kexec@...ts.infradead.org,
linux-kernel@...r.kernel.org
Cc: Pingfan Liu <piliu@...hat.com>,
Waiman Long <longman@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Pierre Gondois <pierre.gondois@....com>,
Andrew Morton <akpm@...ux-foundation.org>,
Baoquan He <bhe@...hat.com>,
Ingo Molnar <mingo@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Valentin Schneider <vschneid@...hat.com>,
"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
Joel Granados <joel.granados@...nel.org>
Subject: [RFC 0/3] kexec: Force kexec to proceed under heavy deadline load
During discussion of the scheduler deadline bug [1], Pierre Gondois
pointed out a potential issue during kexec: as CPUs are unplugged, the
available DL bandwidth of the root domain gradually decreases. At some
point, insufficient bandwidth triggers an overflow detection, causing
CPU hot-removal to fail and kexec to hang.[2]
I reproduced it on a system with 160 cpus with the following command
seq 10 | xargs -I{} -P10 sh -c 'chrt -d -T 1000000 -P 1000000 0 yes > /dev/null &'
kexec -e
The system hang during the kexec process.
This series skips the DL bandwidth check, migrates the task from dying
CPU directly to the kexec CPU, and promotes the kexec to DL task. By
this way, the heavy deadline load will not starve the CPU hot-removal
kthread so that kexec task can move on.
In contrast to this series, an alternative aggressive approach is to
send SIGKILL to all DL tasks at the beginning of the kexec process.
Let us discuss how to resolve this issue.
[1]: https://lore.kernel.org/all/20250929133602.32462-1-piliu@redhat.com/
[2]: https://lore.kernel.org/all/3408aca5-e6c9-434a-9950-82e9147fcbba@arm.com/
Pingfan Liu (3):
sched/deadline: Skip the deadline bandwidth check if kexec_in_progress
kernel/cpu: Mark nonboot cpus as inactive when shutting down nonboot
cpus
kexec_core: Promote the kexec to DL task
kernel/cpu.c | 10 ++++++++++
kernel/kexec_core.c | 28 ++++++++++++++++++++++++++++
kernel/sched/deadline.c | 7 +++++++
3 files changed, 45 insertions(+)
--
2.49.0
Powered by blists - more mailing lists