[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d0a4beb1-3a35-a510-aa9f-813492751cc4@gmail.com>
Date: Tue, 14 Jan 2025 11:18:32 +0800
From: Hao Jia <jiahao.kernel@...il.com>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: mingo@...hat.com, peterz@...radead.org, mingo@...nel.org,
juri.lelli@...hat.com, dietmar.eggemann@....com, rostedt@...dmis.org,
bsegall@...gle.com, mgorman@...e.de, vschneid@...hat.com,
linux-kernel@...r.kernel.org, Hao Jia <jiahao1@...iang.com>
Subject: Re: [PATCH v2] sched/core: Prioritize migrating eligible tasks in
sched_balance_rq()
On 2025/1/14 00:40, Vincent Guittot wrote:
> On Mon, 13 Jan 2025 at 10:21, Hao Jia <jiahao.kernel@...il.com> wrote:
>>
>> Friendly ping...
>>
>>
>> On 2024/12/23 17:14, Hao Jia wrote:
>>> From: Hao Jia <jiahao1@...iang.com>
>>>
>>> When the PLACE_LAG scheduling feature is enabled and
>>> dst_cfs_rq->nr_queued is greater than 1, if a task is
>>> ineligible (lag < 0) on the source cpu runqueue, it will
>>> also be ineligible when it is migrated to the destination
>>> cpu runqueue. Because we will keep the original equivalent
>>> lag of the task in place_entity(). So if the task was
>>> ineligible before, it will still be ineligible after
>>> migration.
>>>
>>> So in sched_balance_rq(), we prioritize migrating eligible
>>> tasks, and we soft-limit ineligible tasks, allowing them
>>> to migrate only when nr_balance_failed is non-zero to
>>> avoid load-balancing trying very hard to balance the load.
>
> Could you explain why you think it's better to balance eligible tasks
> in priority and potentially skip a load balance ?
In place_entity(), we maintain the task's original equivalent lag, even
if we migrate the task to dst_rq, this does not change its eligibility
attribute.
When there are multiple tasks on src_rq, and the dst_cpu has some
runnable tasks, migrating ineligible tasks to dst_rq will not allow them
to run. Therefore, such task migration is inefficient. We should
prioritize migrating tasks that can run on dst_rq.
In other words, migrating ineligible tasks is merely moving them to
another runqueue to wait until they become eligible.
>
> I can see an interest for idle and newly_idle load balance in order to
> favor fairness as tasks will become eligible but I don't see why it
> would be helpful if dst already has some runnable tasks. Furthermore,
> when a cpu is idle or newly idle, we really want to migrate a task
> even an non eligible one instead of possibly skipping this load
> balance round. With your patch, we might end up not pulling any task,
> increasing the nr_balance_failed and waiting next load balance
>
If I understand correctly, when the destination CPU is idle, my patch
does not change the original behavior. it only prevents the migration of
ineligible tasks when dst_cfs_rq->nr_queued is greater than 1.
If I missed something, please correct me.
Thanks,
Hao
>>>
>>> Below are some benchmark test results. From my test results,
>>> this patch shows a slight improvement on hackbench.
>>>
>>> Benchmark
>>> =========
>>>
>>> All of the benchmarks are done inside a normal cpu cgroup in a
>>> clean environment with cpu turbo disabled, and test machine is:
>>>
>>> Single NUMA machine model is 13th Gen Intel(R) Core(TM)
>>> i7-13700, 12 Core/24 HT.
>>>
>>> Based on master b86545e02e8c.
>>>
>>> Results
>>> =======
>>>
>>> hackbench-process-pipes
>>> vanilla patched
>>> Amean 1 0.5837 ( 0.00%) 0.5733 ( 1.77%)
>>> Amean 4 1.4423 ( 0.00%) 1.4503 ( -0.55%)
>>> Amean 7 2.5147 ( 0.00%) 2.4773 ( 1.48%)
>>> Amean 12 3.9347 ( 0.00%) 3.8880 ( 1.19%)
>>> Amean 21 5.3943 ( 0.00%) 5.3873 ( 0.13%)
>>> Amean 30 6.7840 ( 0.00%) 6.6660 ( 1.74%)
>>> Amean 48 9.8313 ( 0.00%) 9.6100 ( 2.25%)
>>> Amean 79 15.4403 ( 0.00%) 14.9580 ( 3.12%)
>>> Amean 96 18.4970 ( 0.00%) 17.9533 ( 2.94%)
>>>
>>> hackbench-process-sockets
>>> vanilla patched
>>> Amean 1 0.6297 ( 0.00%) 0.6223 ( 1.16%)
>>> Amean 4 2.1517 ( 0.00%) 2.0887 ( 2.93%)
>>> Amean 7 3.6377 ( 0.00%) 3.5670 ( 1.94%)
>>> Amean 12 6.1277 ( 0.00%) 5.9290 ( 3.24%)
>>> Amean 21 10.0380 ( 0.00%) 9.7623 ( 2.75%)
>>> Amean 30 14.1517 ( 0.00%) 13.7513 ( 2.83%)
>>> Amean 48 24.7253 ( 0.00%) 24.2287 ( 2.01%)
>>> Amean 79 43.9523 ( 0.00%) 43.2330 ( 1.64%)
>>> Amean 96 54.5310 ( 0.00%) 53.7650 ( 1.40%)
>>>
>>> tbench4 Throughput
>>> vanilla patched
>>> Hmean 1 255.97 ( 0.00%) 275.01 ( 7.44%)
>>> Hmean 2 511.60 ( 0.00%) 544.27 ( 6.39%)
>>> Hmean 4 996.70 ( 0.00%) 1006.57 ( 0.99%)
>>> Hmean 8 1646.46 ( 0.00%) 1649.15 ( 0.16%)
>>> Hmean 16 2259.42 ( 0.00%) 2274.35 ( 0.66%)
>>> Hmean 32 4725.48 ( 0.00%) 4735.57 ( 0.21%)
>>> Hmean 64 4411.47 ( 0.00%) 4400.05 ( -0.26%)
>>> Hmean 96 4284.31 ( 0.00%) 4267.39 ( -0.39%)
>>>
>>> Signed-off-by: Hao Jia <jiahao1@...iang.com>
>>> Suggested-by: Peter Zijlstra (Intel) <peterz@...radead.org>
>>> ---
>>> Previous discussion link: https://lore.kernel.org/all/20241128084858.25220-1-jiahao.kernel@gmail.com
>>> Link to v1: https://lore.kernel.org/all/20241218080203.80556-1-jiahao.kernel@gmail.com
>>>
>>> v1 to v2:
>>> - Modify dst_cfs_rq->nr_running to dst_cfs_rq->nr_queued to
>>> resolve conflicts with commit 736c55a02c47 ("sched/fair:
>>> Rename cfs_rq.nr_running into nr_queued").
>>>
>>> kernel/sched/fair.c | 34 ++++++++++++++++++++++++++++++++++
>>> 1 file changed, 34 insertions(+)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 5599b0c1ba9b..c884bf631e66 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -9396,6 +9396,30 @@ static inline int migrate_degrades_locality(struct task_struct *p,
>>> }
>>> #endif
>>>
>>> +/*
>>> + * Check whether the task is ineligible on the destination cpu
>>> + *
>>> + * When the PLACE_LAG scheduling feature is enabled and
>>> + * dst_cfs_rq->nr_queued is greater than 1, if the task
>>> + * is ineligible, it will also be ineligible when
>>> + * it is migrated to the destination cpu.
>>> + */
>>> +static inline int task_is_ineligible_on_dst_cpu(struct task_struct *p, int dest_cpu)
>>> +{
>>> + struct cfs_rq *dst_cfs_rq;
>>> +
>>> +#ifdef CONFIG_FAIR_GROUP_SCHED
>>> + dst_cfs_rq = task_group(p)->cfs_rq[dest_cpu];
>>> +#else
>>> + dst_cfs_rq = &cpu_rq(dest_cpu)->cfs;
>>> +#endif
>>> + if (sched_feat(PLACE_LAG) && dst_cfs_rq->nr_queued &&
>>> + !entity_eligible(task_cfs_rq(p), &p->se))
>>> + return 1;
>>> +
>>> + return 0;
>>> +}
>>> +
>>> /*
>>> * can_migrate_task - may task p from runqueue rq be migrated to this_cpu?
>>> */
>>> @@ -9420,6 +9444,16 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
>>> if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
>>> return 0;
>>>
>>> + /*
>>> + * We want to prioritize the migration of eligible tasks.
>>> + * For ineligible tasks we soft-limit them and only allow
>>> + * them to migrate when nr_balance_failed is non-zero to
>>> + * avoid load-balancing trying very hard to balance the load.
>>> + */
>>> + if (!env->sd->nr_balance_failed &&
>>> + task_is_ineligible_on_dst_cpu(p, env->dst_cpu))
>>> + return 0;
>>> +
>>> /* Disregard percpu kthreads; they are where they need to be. */
>>> if (kthread_is_per_cpu(p))
>>> return 0;
Powered by blists - more mailing lists