lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251028030914.9520-1-piliu@redhat.com>
Date: Tue, 28 Oct 2025 11:09:12 +0800
From: Pingfan Liu <piliu@...hat.com>
To: kexec@...ts.infradead.org,
	linux-kernel@...r.kernel.org
Cc: Pingfan Liu <piliu@...hat.com>,
	Waiman Long <longman@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Juri Lelli <juri.lelli@...hat.com>,
	Pierre Gondois <pierre.gondois@....com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Baoquan He <bhe@...hat.com>,
	Ingo Molnar <mingo@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Valentin Schneider <vschneid@...hat.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
	Joel Granados <joel.granados@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: [PATCHv2 0/2]: kexec: Force kexec to proceed under heavy deadline load

During discussion of the scheduler deadline bug [1], Pierre Gondois
pointed out a potential issue during kexec: as CPUs are unplugged, the
available DL bandwidth of the root domain gradually decreases. At some
point, insufficient bandwidth triggers an overflow detection, causing
CPU hot-removal to fail and kexec to hang.[2]
    
I reproduced it on a system with 160 cpus with the following command
  seq 10 | xargs -I{} -P10 sh -c 'chrt -d -T 1000000 -P 1000000 0 yes > /dev/null &'
  kexec -e

The system hang during the kexec process.
 
This series skips the DL bandwidth check, SIGSTOP all DL tasks so that
the kexec process can proceed.

[1]: https://lore.kernel.org/all/20250929133602.32462-1-piliu@redhat.com/
[2]: https://lore.kernel.org/all/3408aca5-e6c9-434a-9950-82e9147fcbba@arm.com/

RFC -> v2:
Instead of migrating the DL tasks, SIGSTOP them.

Pingfan Liu (2):
  sched/deadline: Skip the deadline bandwidth check if kexec_in_progress
  kernel/kexec: Stop all userspace deadline tasks

 kernel/kexec_core.c     | 23 +++++++++++++++++++++++
 kernel/sched/deadline.c |  7 +++++++
 2 files changed, 30 insertions(+)

-- 
2.49.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ