linux-kernel - Re: [PATCH v3 3/5] sched/fair: Switch to task based throttle model

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <13467b08-96a6-478d-bb97-060b7d8887e4@amd.com>
Date: Wed, 3 Sep 2025 16:01:03 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Aaron Lu <ziqianlu@...edance.com>
CC: Valentin Schneider <vschneid@...hat.com>, Ben Segall <bsegall@...gle.com>,
	Peter Zijlstra <peterz@...radead.org>, Chengming Zhou
	<chengming.zhou@...ux.dev>, Josh Don <joshdon@...gle.com>, Ingo Molnar
	<mingo@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>, Xi Wang
	<xii@...gle.com>, <linux-kernel@...r.kernel.org>, Juri Lelli
	<juri.lelli@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>, "Steven
 Rostedt" <rostedt@...dmis.org>, Mel Gorman <mgorman@...e.de>, Chuyi Zhou
	<zhouchuyi@...edance.com>, Jan Kiszka <jan.kiszka@...mens.com>, "Florian
 Bezdeka" <florian.bezdeka@...mens.com>, Songtang Liu
	<liusongtang@...edance.com>, Sebastian Andrzej Siewior
	<bigeasy@...utronix.de>
Subject: Re: [PATCH v3 3/5] sched/fair: Switch to task based throttle model

Hello Aaron,

On 9/3/2025 3:41 PM, Aaron Lu wrote:
> Hi Prateek,
> 
> On Wed, Sep 03, 2025 at 02:41:55PM +0530, K Prateek Nayak wrote:
>> Hello Aaron,
>>
>> On 9/3/2025 12:44 PM, Aaron Lu wrote:
>>> On Fri, Aug 22, 2025 at 07:07:01PM +0800, Aaron Lu wrote:
>>>> With this said, I reduced the task number and retested on this 2nd AMD
>>>> Genoa:
>>>> - quota set to 50 cpu for each level1 cgroup;
>>
>> What exactly is the quota and period when you say 50cpu?
> 
> period is the default 100000 and quota is set to 5000000.

Thank you! I'll do some tests on my end as well.

[..snip..]
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index df8dc389af8e1..3e927b9b7eeb6 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9370,6 +9370,9 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
>  	if ((p->se.sched_delayed) && (env->migration_type != migrate_load))
>  		return 0;
>  
> +	if (throttled_hierarchy(task_group(p)->cfs_rq[env->dst_cpu]))
> +		return 0;
> +

This makes sense instead of the full throttled_lb_pair(). You'll still
need to put it behind CONFIG_CGROUP_SCHED (or better yet
CONFIG_CFS_BANDWIDTH) since task_group() can return NULL if GROUP_SCHED
is not enabled.

>  	/*
>  	 * We want to prioritize the migration of eligible tasks.
>  	 * For ineligible tasks we soft-limit them and only allow
> 
>          base              head'               diff     head(patch1-5)
> Time     82.55±4.82%       83.81±2.89%         -1.5%    99.69±6.71%
> 
> head': head + above diff
> 
> I also tested netperf on this AMD system as well as hackbench and
> netperf on Intel EMR, no obvious performance difference observed
> after applying the above diff, i.e. base and head' performance is
> roughly the same.
> 
> Does the above diff make sense? One thing I'm slightly concerned is,
> there may be one case when balancing a task to a throttled target
> cfs_rq makes sense: if the task holds some kernel resource and is
> running inside kernel, even its target cfs_rq is throttled, we still
> want this task to go there and finish its job in kernel mode sooner,
> this could help other resource waiters. But, this may not be a big deal

I think it is still an improvement over per-cfs_rq throttling from a
tail latency perspective.

> and in most of the time, balancing a task to a throttled cfs_rq doesn't
> look like a meaningful thing to do.Ack.

-- 
Thanks and Regards,
Prateek