linux-kernel - Re: [PATCH v2] sched/core: Prioritize migrating eligible tasks in sched_balance

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7c3275ac-d045-4bab-b9f4-2283acef1111@arm.com>
Date: Wed, 15 Jan 2025 13:18:33 +0000
From: Luis Machado <luis.machado@....com>
To: Hao Jia <jiahao.kernel@...il.com>, mingo@...hat.com,
 peterz@...radead.org, mingo@...nel.org, juri.lelli@...hat.com,
 vincent.guittot@...aro.org, dietmar.eggemann@....com, rostedt@...dmis.org,
 bsegall@...gle.com, mgorman@...e.de, vschneid@...hat.com
Cc: linux-kernel@...r.kernel.org, Hao Jia <jiahao1@...iang.com>
Subject: Re: [PATCH v2] sched/core: Prioritize migrating eligible tasks in
 sched_balance_rq()

On 12/23/24 09:14, Hao Jia wrote:
> From: Hao Jia <jiahao1@...iang.com>
> 
> When the PLACE_LAG scheduling feature is enabled and
> dst_cfs_rq->nr_queued is greater than 1, if a task is
> ineligible (lag < 0) on the source cpu runqueue, it will
> also be ineligible when it is migrated to the destination
> cpu runqueue. Because we will keep the original equivalent
> lag of the task in place_entity(). So if the task was
> ineligible before, it will still be ineligible after
> migration.
> 
> So in sched_balance_rq(), we prioritize migrating eligible
> tasks, and we soft-limit ineligible tasks, allowing them
> to migrate only when nr_balance_failed is non-zero to
> avoid load-balancing trying very hard to balance the load.
> 
> Below are some benchmark test results. From my test results,
> this patch shows a slight improvement on hackbench.
> 
> Benchmark
> =========
> 
> All of the benchmarks are done inside a normal cpu cgroup in a
> clean environment with cpu turbo disabled, and test machine is:
> 
> Single NUMA machine model is 13th Gen Intel(R) Core(TM)
> i7-13700, 12 Core/24 HT.
> 
> Based on master b86545e02e8c.
> 
> Results
> =======
> 
> hackbench-process-pipes
>                       vanilla                  patched
> Amean     1       0.5837 (   0.00%)      0.5733 (   1.77%)
> Amean     4       1.4423 (   0.00%)      1.4503 (  -0.55%)
> Amean     7       2.5147 (   0.00%)      2.4773 (   1.48%)
> Amean     12      3.9347 (   0.00%)      3.8880 (   1.19%)
> Amean     21      5.3943 (   0.00%)      5.3873 (   0.13%)
> Amean     30      6.7840 (   0.00%)      6.6660 (   1.74%)
> Amean     48      9.8313 (   0.00%)      9.6100 (   2.25%)
> Amean     79     15.4403 (   0.00%)     14.9580 (   3.12%)
> Amean     96     18.4970 (   0.00%)     17.9533 (   2.94%)
> 
> hackbench-process-sockets
>                       vanilla                  patched
> Amean     1       0.6297 (   0.00%)      0.6223 (   1.16%)
> Amean     4       2.1517 (   0.00%)      2.0887 (   2.93%)
> Amean     7       3.6377 (   0.00%)      3.5670 (   1.94%)
> Amean     12      6.1277 (   0.00%)      5.9290 (   3.24%)
> Amean     21     10.0380 (   0.00%)      9.7623 (   2.75%)
> Amean     30     14.1517 (   0.00%)     13.7513 (   2.83%)
> Amean     48     24.7253 (   0.00%)     24.2287 (   2.01%)
> Amean     79     43.9523 (   0.00%)     43.2330 (   1.64%)
> Amean     96     54.5310 (   0.00%)     53.7650 (   1.40%)
> 
> tbench4 Throughput
>                       vanilla                  patched
> Hmean     1       255.97 (   0.00%)      275.01 (   7.44%)
> Hmean     2       511.60 (   0.00%)      544.27 (   6.39%)
> Hmean     4       996.70 (   0.00%)     1006.57 (   0.99%)
> Hmean     8      1646.46 (   0.00%)     1649.15 (   0.16%)
> Hmean     16     2259.42 (   0.00%)     2274.35 (   0.66%)
> Hmean     32     4725.48 (   0.00%)     4735.57 (   0.21%)
> Hmean     64     4411.47 (   0.00%)     4400.05 (  -0.26%)
> Hmean     96     4284.31 (   0.00%)     4267.39 (  -0.39%)
> 
> Signed-off-by: Hao Jia <jiahao1@...iang.com>
> Suggested-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> ---
> Previous discussion link: https://lore.kernel.org/all/20241128084858.25220-1-jiahao.kernel@gmail.com
> Link to v1: https://lore.kernel.org/all/20241218080203.80556-1-jiahao.kernel@gmail.com
> 
> v1 to v2:
>  - Modify dst_cfs_rq->nr_running to dst_cfs_rq->nr_queued to
>    resolve conflicts with commit 736c55a02c47 ("sched/fair:
>    Rename cfs_rq.nr_running into nr_queued").
> 
>  kernel/sched/fair.c | 34 ++++++++++++++++++++++++++++++++++
>  1 file changed, 34 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 5599b0c1ba9b..c884bf631e66 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9396,6 +9396,30 @@ static inline int migrate_degrades_locality(struct task_struct *p,
>  }
>  #endif
>  
> +/*
> + * Check whether the task is ineligible on the destination cpu
> + *
> + * When the PLACE_LAG scheduling feature is enabled and
> + * dst_cfs_rq->nr_queued is greater than 1, if the task
> + * is ineligible, it will also be ineligible when
> + * it is migrated to the destination cpu.
> + */
> +static inline int task_is_ineligible_on_dst_cpu(struct task_struct *p, int dest_cpu)
> +{
> +	struct cfs_rq *dst_cfs_rq;
> +
> +#ifdef CONFIG_FAIR_GROUP_SCHED
> +	dst_cfs_rq = task_group(p)->cfs_rq[dest_cpu];
> +#else
> +	dst_cfs_rq = &cpu_rq(dest_cpu)->cfs;
> +#endif
> +	if (sched_feat(PLACE_LAG) && dst_cfs_rq->nr_queued &&
> +	    !entity_eligible(task_cfs_rq(p), &p->se))
> +		return 1;
> +
> +	return 0;
> +}
> +
>  /*
>   * can_migrate_task - may task p from runqueue rq be migrated to this_cpu?
>   */
> @@ -9420,6 +9444,16 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
>  	if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
>  		return 0;
>  
> +	/*
> +	 * We want to prioritize the migration of eligible tasks.
> +	 * For ineligible tasks we soft-limit them and only allow
> +	 * them to migrate when nr_balance_failed is non-zero to
> +	 * avoid load-balancing trying very hard to balance the load.
> +	 */
> +	if (!env->sd->nr_balance_failed &&
> +	    task_is_ineligible_on_dst_cpu(p, env->dst_cpu))
> +		return 0;
> +
>  	/* Disregard percpu kthreads; they are where they need to be. */
>  	if (kthread_is_per_cpu(p))
>  		return 0;

Just a general comment.

If we throw tasks with custom slices into the mix, I wonder what kinds of impacts
we would see from migrating eligible tasks with very short slice lengths (they may
run earlier) or very long ones.

We could've picked an inelegible task before, which wouldn't have an impact other than
the overhead of the migration. But the patch might make it so we now pick an eligible
one with a custom slice.