linux-kernel - Re: [PATCH] sched/deadline: Derive root domain from active cpu in task's cpus

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aO9q9EJbUw0QqbXv@jlelli-thinkpadt14gen4.remote.csb>
Date: Wed, 15 Oct 2025 11:35:48 +0200
From: Juri Lelli <juri.lelli@...hat.com>
To: Pingfan Liu <piliu@...hat.com>
Cc: Pierre Gondois <pierre.gondois@....com>,
	Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org,
	Ingo Molnar <mingo@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>
Subject: Re: [PATCH] sched/deadline: Derive root domain from active cpu in
 task's cpus_ptr

On 14/10/25 21:09, Pingfan Liu wrote:
> Hi Pierre,
> 
> Thanks for sharing your perspective.
> 
> On Sat, Oct 11, 2025 at 12:26 AM Pierre Gondois <pierre.gondois@....com> wrote:
> >
> >
> > On 10/6/25 14:12, Juri Lelli wrote:
> > > On 06/10/25 12:13, Pierre Gondois wrote:
> > >> On 9/30/25 11:04, Peter Zijlstra wrote:
> > >>> On Tue, Sep 30, 2025 at 08:20:06AM +0100, Juri Lelli wrote:
> > >>>
> > >>>> I actually wonder if we shouldn't make cppc_fie a "special" DEADLINE
> > >>>> tasks (like schedutil [1]). IIUC that is how it is thought to behave
> > >>>> already [2], but, since it's missing the SCHED_FLAG_SUGOV flag(/hack),
> > >>>> it is not "transparent" from a bandwidth tracking point of view.
> > >>>>
> > >>>> 1 -https://elixir.bootlin.com/linux/v6.17/source/kernel/sched/cpufreq_schedutil.c#L661
> > >>>> 2 -https://elixir.bootlin.com/linux/v6.17/source/drivers/cpufreq/cppc_cpufreq.c#L198
> > >>> Right, I remember that hack. Bit sad its spreading, but this CPPC thing
> > >>> is very much like the schedutil one, so might as well do that I suppose.
> > >> IIUC, the sugov thread was switched to deadline to allow frequency updates
> > >> when deadline tasks start to run. I.e. there should be no point updating the
> > >> freq. after the deadline task finished running, cf [1] and [2]
> > >>
> > >> The CPPC FIE worker should not require to run that quickly as it seems to be
> > >> more like a freq. maintenance work (the call comes from the sched tick)
> > >>
> > >> sched_tick()
> > >> \-arch_scale_freq_tick() / topology_scale_freq_tick()
> > >>    \-set_freq_scale() / cppc_scale_freq_tick()
> > >>      \-irq_work_queue()
> > > OK, but how much bandwidth is enough for it (on different platforms)?
> > > Also, I am not sure the worker follows cpusets/root domain changes.
> > >
> > >
> > To share some additional information, I could to reproduce the issue by
> > creating as many deadline tasks with a huge bandwidth that the platform
> > allows it:
> > chrt -d -T 1000000 -P 1000000 0 yes > /dev/null &
> >
> > Then kexec to another kernel. The available bandwidth of the root domain
> > gradually decreases with the number of CPUs unplugged.
> > At some point, there is not enough bandwidth and an overflow is detected.
> > (Same call stack as in the original message).

I seem to agree with Pingfan below, kexec (kernel crash?) is a case
where all guarantees are out of the window anyway, so really no point in
keeping track of bandwidth and failing hotplug. Guess we should be
adding an ad-hoc check/bail for this case.

> > So I'm not sure this is really related to the cppc_fie thread.
> > I think it's more related to checking the available bandwidth in a context
> > which is not appropriate. The deadline bandwidth might lack when the
> > platform
> > is reset, but this should not be that important.
> >
> 
> I think there are two independent issues.
> 
> In your experiment, as CPUs are hot-removed one by one, at some point
> the hot-removal will fail due to insufficient DL bandwidth. There
> should be a warning message to inform users about what's happening,
> and users can then remove some DL tasks to continue the CPU
> hot-removal.
> 
> Meanwhile, in the kexec case, this checking can be skipped since the
> system cannot roll back to a working state anyway
> 
> 
> Thanks,
> 
> Pingfan
> > ---
> >
> > Question:
> > Since the cppc_fie worker doesn't have the SCHED_FLAG_SUGOV flag,
> > is this comment actually correct ?
> > /*
> >   * Fake (unused) bandwidth; workaround to "fix"
> >   * priority inheritance.
> >   */
> >
> > ---
> >
> > On a non-deadline related topic, the CPPC drivers creates a cppc_fie
> > worker in
> > case the CPPC counters to estimate the current frequency are in PCC
> > channels.
> > Accessing these channels requires to go through sleeping sections,
> > that's why a worker is used.
> >
> > However, CPPC counters might be accessed through FFH, which doesn't go
> > through
> > sleeping sections. In such case, the cppc_fie worker is never used and never
> > removed, so it would be nice to remote it.
> >
>