linux-kernel - Re: [PATCH] sched/deadline: Derive root domain from active cpu in task's cpus

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAF+s44Tv1n0b1GSghSPP3xDPK4qzbzc629XMB9btzXuKgfKvcA@mail.gmail.com>
Date: Tue, 14 Oct 2025 21:09:24 +0800
From: Pingfan Liu <piliu@...hat.com>
To: Pierre Gondois <pierre.gondois@....com>
Cc: Juri Lelli <juri.lelli@...hat.com>, Peter Zijlstra <peterz@...radead.org>, 
	linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>, 
	Vincent Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>, 
	Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Valentin Schneider <vschneid@...hat.com>
Subject: Re: [PATCH] sched/deadline: Derive root domain from active cpu in
 task's cpus_ptr

Hi Pierre,

Thanks for sharing your perspective.

On Sat, Oct 11, 2025 at 12:26 AM Pierre Gondois <pierre.gondois@....com> wrote:
>
>
> On 10/6/25 14:12, Juri Lelli wrote:
> > On 06/10/25 12:13, Pierre Gondois wrote:
> >> On 9/30/25 11:04, Peter Zijlstra wrote:
> >>> On Tue, Sep 30, 2025 at 08:20:06AM +0100, Juri Lelli wrote:
> >>>
> >>>> I actually wonder if we shouldn't make cppc_fie a "special" DEADLINE
> >>>> tasks (like schedutil [1]). IIUC that is how it is thought to behave
> >>>> already [2], but, since it's missing the SCHED_FLAG_SUGOV flag(/hack),
> >>>> it is not "transparent" from a bandwidth tracking point of view.
> >>>>
> >>>> 1 -https://elixir.bootlin.com/linux/v6.17/source/kernel/sched/cpufreq_schedutil.c#L661
> >>>> 2 -https://elixir.bootlin.com/linux/v6.17/source/drivers/cpufreq/cppc_cpufreq.c#L198
> >>> Right, I remember that hack. Bit sad its spreading, but this CPPC thing
> >>> is very much like the schedutil one, so might as well do that I suppose.
> >> IIUC, the sugov thread was switched to deadline to allow frequency updates
> >> when deadline tasks start to run. I.e. there should be no point updating the
> >> freq. after the deadline task finished running, cf [1] and [2]
> >>
> >> The CPPC FIE worker should not require to run that quickly as it seems to be
> >> more like a freq. maintenance work (the call comes from the sched tick)
> >>
> >> sched_tick()
> >> \-arch_scale_freq_tick() / topology_scale_freq_tick()
> >>    \-set_freq_scale() / cppc_scale_freq_tick()
> >>      \-irq_work_queue()
> > OK, but how much bandwidth is enough for it (on different platforms)?
> > Also, I am not sure the worker follows cpusets/root domain changes.
> >
> >
> To share some additional information, I could to reproduce the issue by
> creating as many deadline tasks with a huge bandwidth that the platform
> allows it:
> chrt -d -T 1000000 -P 1000000 0 yes > /dev/null &
>
> Then kexec to another kernel. The available bandwidth of the root domain
> gradually decreases with the number of CPUs unplugged.
> At some point, there is not enough bandwidth and an overflow is detected.
> (Same call stack as in the original message).
>
> So I'm not sure this is really related to the cppc_fie thread.
> I think it's more related to checking the available bandwidth in a context
> which is not appropriate. The deadline bandwidth might lack when the
> platform
> is reset, but this should not be that important.
>

I think there are two independent issues.

In your experiment, as CPUs are hot-removed one by one, at some point
the hot-removal will fail due to insufficient DL bandwidth. There
should be a warning message to inform users about what's happening,
and users can then remove some DL tasks to continue the CPU
hot-removal.

Meanwhile, in the kexec case, this checking can be skipped since the
system cannot roll back to a working state anyway


Thanks,

Pingfan
> ---
>
> Question:
> Since the cppc_fie worker doesn't have the SCHED_FLAG_SUGOV flag,
> is this comment actually correct ?
> /*
>   * Fake (unused) bandwidth; workaround to "fix"
>   * priority inheritance.
>   */
>
> ---
>
> On a non-deadline related topic, the CPPC drivers creates a cppc_fie
> worker in
> case the CPPC counters to estimate the current frequency are in PCC
> channels.
> Accessing these channels requires to go through sleeping sections,
> that's why a worker is used.
>
> However, CPPC counters might be accessed through FFH, which doesn't go
> through
> sleeping sections. In such case, the cppc_fie worker is never used and never
> removed, so it would be nice to remote it.
>