[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4aa7ab1e-b006-491f-8224-63dbc86295a3@kylinos.cn>
Date: Tue, 15 Jul 2025 13:54:43 +0800
From: Zihuan Zhang <zhangzihuan@...inos.cn>
To: Christian Loehle <christian.loehle@....com>, xuewen.yan@...soc.com,
vincent.guittot@...aro.org, mingo@...hat.com, peterz@...radead.org,
juri.lelli@...hat.com
Cc: rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
vschneid@...hat.com, hongyan.xia2@....com, linux-kernel@...r.kernel.org,
ke.wang@...soc.com, di.shen@...soc.com, xuewen.yan94@...il.com,
kprateek.nayak@....com, kuyo.chang@...iatek.com, juju.sung@...iatek.com,
qyousef@...alina.io
Subject: Re: [PATCH v1] sched/uclamp: Exclude kernel threads from uclamp logic
在 2025/7/10 16:41, Christian Loehle 写道:
> On 7/10/25 01:47, Zihuan Zhang wrote:
>> Hi Christian,
>> Apologies for the late reply, and thanks for raising the concerns.
>>
>> 在 2025/7/3 18:17, Christian Loehle 写道:
>>> On 7/3/25 11:07, Zihuan Zhang wrote:
>>>> Hi Christian,
>>>>
>>>> Thanks for the question!
>>>>
>>>> 在 2025/7/3 17:22, Christian Loehle 写道:
>>>>> On 7/3/25 10:14, Zihuan Zhang wrote:
>>>>>> Kernel threads (PF_KTHREAD) are not subject to user-defined utilization
>>>>>> clamping. They do not represent user workloads and should not participate
>>>>>> in any uclamp logic, including:
>>>>> Why not?
>>>>>
>>>> As Xuewen mentioned, some kernel threads may intentionally set scheduling attributes for performance. So instead of unconditionally excluding all kernel threads, I’m now considering a more conservative approach:
>>>> skip only those kthreads that haven’t explicitly set any clamp values.
>>>>
>>>> This should help avoid unintended clamp aggregation while still supporting performance-tuned kthreads.
>>> I'm skeptical, fundamentally you cannot exclude some fair tasks from uclamp logic.
>>> At least the cpufreq part they will be affected by, so if you 'exclude' some
>>> kthread that doesn't have clamps set (i.e. has min=0, max=1024) its
>>> utilization may not contribute to sugov frequency selection by being
>>> clamped by other task(s) (let's say you only have one other task with
>>> max=0, excluding the unclamped kthread now leads to sugov requesting
>>> the lowest OPP? Is that always correct/desired?)
>>>
>>> Is there a specific issue you're trying to solve?
>>> FYI there has been discussion around reworking the uclamp mechanism to solve
>>> some issues you may have been facing, but so far they haven't lead anywhere:
>>> https://lore.kernel.org/lkml/cover.1741091349.git.hongyan.xia2@arm.com/
>> Our original motivation stems from the observation that uclamp is primarily designed to manage frequency selection based on user-space task behavior. Kernel threads typically do not represent user workloads and are often not considered meaningful participants in uclamp-driven decisions.
> Two comments to that:
> - It's also used to drive task placement, not just frequency selection.
> - There can be cases where a kthread is fundamentally part of a user workload,
> thinking about io_uring here, but others exist too.
>
>> To be clear, we are not aiming to exclude all kthreads from affecting frequency, but rather to explore ways to avoid unnecessary uclamp aggregation overhead from kernel threads that have no explicit clamp values set (i.e. uclamp.min=0, max=1024).
>> As you pointed out, fully excluding these tasks might interfere with sugov behavior in certain edge cases. So a more balanced approach might be:
>>
>> - For kernel threads that do not set any clamp values, skip the clamp aggregation step
>>
>> - If a kernel thread explicitly sets clamp attributes, it should of course remain fully visible to uclamp logic.
>>
>> This would preserve correctness while reducing unnecessary overhead in the hot path, especially on systems with many runnable tasks.
> So an unclamped task not being part of uclamp will definitely affect the UCLAMP_MAX
> result, as I've mentioned above, you'll apply (other tasks) UCLAMP_MAX restrictions
> even if the kthread has UCLAMP_MAX==1024. That is not always desirable.
> Or would you let it take part in uclamp if the user explicitly set UCLAMP_MAX==1024
> instead of relying on the default? That wouldn't be consistent IMO.
>
> Regarding the optimization part:
> Is there a specific workload where the overhead is an issue? It should
> be rather small. Some numbers should help.
You’re absolutely right — excluding unclamped kernel threads entirely
can unintentionally affect UCLAMP_MAX aggregation, and may lead to
undesirable behavior in edge cases. I agree that this would not be a
consistent or generally correct approach.
At this stage, I think the idea still lacks maturity, and I appreciate
your input in highlighting the possible implications. I’m currently
diving deeper into the schedutil governor code to better understand how
uclamp aggregation interacts with frequency selection and task placement
in real workloads.
With that in mind, I’ll take a step back and revisit the broader problem
from a more informed perspective. Hopefully, in the near future, I’ll
come up with a more solid and well-justified solution.
Thanks again for your time and insights.
Best regards,
Zihuan Zhang
Powered by blists - more mailing lists