lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2cbb3467-1b49-4c86-9fad-9c75ce7d9c8f@arm.com>
Date: Mon, 29 Jul 2024 11:50:40 +0200
From: Pierre Gondois <pierre.gondois@....com>
To: stable@...r.kernel.org, Sasha Levin <sashal@...nel.org>,
 Lukasz Luba <Lukasz.Luba@....com>
Cc: linux-kernel@...r.kernel.org, Qais Yousef <qyousef@...alina.io>,
 Vincent Guittot <vincent.guittot@...aro.org>,
 Dietmar Eggemann <dietmar.eggemann@....com>, Ingo Molnar <mingo@...hat.com>,
 Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH v3] sched/fair: Use all little CPUs for CPU-bound workload

Hello Sasha,
Would it be possible to pick this patch for the 6.1 stable branch ?
Or is there something I should do for this purpose ?

Regards,
Pierre

On 6/25/24 15:25, Pierre Gondois wrote:
> Hello stable folk,
> 
> This patch was merged as:
>     commit 3af7524b1419 ("sched/fair: Use all little CPUs for CPU-bound workloads")
> into 6.7, improving the following:
>     commit 0b0695f2b34a ("sched/fair: Rework load_balance()")
> 
> Would it be possible to port it to the 6.1 stable branch ?
> The patch should apply cleanly by cherry-picking onto v6.1.94,
> 
> Regards,
> Pierre
> 
> 
> On 12/6/23 10:00, Pierre Gondois wrote:
>> Running n CPU-bound tasks on an n CPUs platform:
>> - with asymmetric CPU capacity
>> - not being a DynamIq system (i.e. having a PKG level sched domain
>>     without the SD_SHARE_PKG_RESOURCES flag set)
>> might result in a task placement where two tasks run on a big CPU
>> and none on a little CPU. This placement could be more optimal by
>> using all CPUs.
>>
>> Testing platform:
>> Juno-r2:
>> - 2 big CPUs (1-2), maximum capacity of 1024
>> - 4 little CPUs (0,3-5), maximum capacity of 383
>>
>> Testing workload ([1]):
>> Spawn 6 CPU-bound tasks. During the first 100ms (step 1), each tasks
>> is affine to a CPU, except for:
>> - one little CPU which is left idle.
>> - one big CPU which has 2 tasks affine.
>> After the 100ms (step 2), remove the cpumask affinity.
>>
>> Before patch:
>> During step 2, the load balancer running from the idle CPU tags sched
>> domains as:
>> - little CPUs: 'group_has_spare'. Cf. group_has_capacity() and
>>     group_is_overloaded(), 3 CPU-bound tasks run on a 4 CPUs
>>     sched-domain, and the idle CPU provides enough spare capacity
>>     regarding the imbalance_pct
>> - big CPUs: 'group_overloaded'. Indeed, 3 tasks run on a 2 CPUs
>>     sched-domain, so the following path is used:
>>     group_is_overloaded()
>>     \-if (sgs->sum_nr_running <= sgs->group_weight) return true;
>>
>>     The following path which would change the migration type to
>>     'migrate_task' is not taken:
>>     calculate_imbalance()
>>     \-if (env->idle != CPU_NOT_IDLE && env->imbalance == 0)
>>     as the local group has some spare capacity, so the imbalance
>>     is not 0.
>>
>> The migration type requested is 'migrate_util' and the busiest
>> runqueue is the big CPU's runqueue having 2 tasks (each having a
>> utilization of 512). The idle little CPU cannot pull one of these
>> task as its capacity is too small for the task. The following path
>> is used:
>> detach_tasks()
>> \-case migrate_util:
>>     \-if (util > env->imbalance) goto next;
>>
>> After patch:
>> As the number of failed balancing attempts grows (with
>> 'nr_balance_failed'), progressively make it easier to migrate
>> a big task to the idling little CPU. A similar mechanism is
>> used for the 'migrate_load' migration type.
>>
>> Improvement:
>> Running the testing workload [1] with the step 2 representing
>> a ~10s load for a big CPU:
>> Before patch: ~19.3s
>> After patch: ~18s (-6.7%)
>>
>> Similar issue reported at:
>> https://lore.kernel.org/lkml/20230716014125.139577-1-qyousef@layalina.io/
>>
>> v1:
>> https://lore.kernel.org/all/20231110125902.2152380-1-pierre.gondois@arm.com/
>> v2:
>> https://lore.kernel.org/all/20231124153323.3202444-1-pierre.gondois@arm.com/
>>
>> Suggested-by: Vincent Guittot <vincent.guittot@...aro.org>
>> Signed-off-by: Pierre Gondois <pierre.gondois@....com>
>> Reviewed-by: Vincent Guittot <vincent.guittot@...aro.org>
>> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@....com>
>> ---
>>
>> Notes:
>>       v2:
>>       - Used Vincent's approach.
>>       v3:
>>       - Updated commit message.
>>       - Added Reviewed-by tags
>>
>>    kernel/sched/fair.c | 2 +-
>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index d7a3c63a2171..9481b8cff31b 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -9060,7 +9060,7 @@ static int detach_tasks(struct lb_env *env)
>>    		case migrate_util:
>>    			util = task_util_est(p);
>>    
>> -			if (util > env->imbalance)
>> +			if (shr_bound(util, env->sd->nr_balance_failed) > env->imbalance)
>>    				goto next;
>>    
>>    			env->imbalance -= util;
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ