linux-kernel - Re: [PATCH v2] sched/core: Prioritize migrating eligible tasks in sched_balance

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d0a4beb1-3a35-a510-aa9f-813492751cc4@gmail.com>
Date: Tue, 14 Jan 2025 11:18:32 +0800
From: Hao Jia <jiahao.kernel@...il.com>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: mingo@...hat.com, peterz@...radead.org, mingo@...nel.org,
 juri.lelli@...hat.com, dietmar.eggemann@....com, rostedt@...dmis.org,
 bsegall@...gle.com, mgorman@...e.de, vschneid@...hat.com,
 linux-kernel@...r.kernel.org, Hao Jia <jiahao1@...iang.com>
Subject: Re: [PATCH v2] sched/core: Prioritize migrating eligible tasks in
 sched_balance_rq()



On 2025/1/14 00:40, Vincent Guittot wrote:
> On Mon, 13 Jan 2025 at 10:21, Hao Jia <jiahao.kernel@...il.com> wrote:
>>
>> Friendly ping...
>>
>>
>> On 2024/12/23 17:14, Hao Jia wrote:
>>> From: Hao Jia <jiahao1@...iang.com>
>>>
>>> When the PLACE_LAG scheduling feature is enabled and
>>> dst_cfs_rq->nr_queued is greater than 1, if a task is
>>> ineligible (lag < 0) on the source cpu runqueue, it will
>>> also be ineligible when it is migrated to the destination
>>> cpu runqueue. Because we will keep the original equivalent
>>> lag of the task in place_entity(). So if the task was
>>> ineligible before, it will still be ineligible after
>>> migration.
>>>
>>> So in sched_balance_rq(), we prioritize migrating eligible
>>> tasks, and we soft-limit ineligible tasks, allowing them
>>> to migrate only when nr_balance_failed is non-zero to
>>> avoid load-balancing trying very hard to balance the load.
> 
> Could you explain why you think it's better to balance eligible tasks
> in priority and potentially skip a load balance ?

In place_entity(), we maintain the task's original equivalent lag, even 
if we migrate the task to dst_rq, this does not change its eligibility 
attribute.

When there are multiple tasks on src_rq, and the dst_cpu has some 
runnable tasks, migrating ineligible tasks to dst_rq will not allow them 
to run. Therefore, such task migration is inefficient. We should 
prioritize migrating tasks that can run on dst_rq.

In other words, migrating ineligible tasks is merely moving them to 
another runqueue to wait until they become eligible.


> 
> I can see an interest for idle and newly_idle load balance in order to
> favor fairness as tasks will become eligible but I don't see why it
> would be helpful if dst already has some runnable tasks. Furthermore,
> when a cpu is idle or newly idle, we really want to migrate a task
> even an non eligible one instead of possibly skipping this load
> balance round. With your patch, we might end up not pulling any task,
> increasing the nr_balance_failed and waiting next load balance
> 

If I understand correctly, when the destination CPU is idle, my patch 
does not change the original behavior. it only prevents the migration of 
ineligible tasks when dst_cfs_rq->nr_queued is greater than 1.

If I missed something, please correct me.


Thanks,
Hao


>>>
>>> Below are some benchmark test results. From my test results,
>>> this patch shows a slight improvement on hackbench.
>>>
>>> Benchmark
>>> =========
>>>
>>> All of the benchmarks are done inside a normal cpu cgroup in a
>>> clean environment with cpu turbo disabled, and test machine is:
>>>
>>> Single NUMA machine model is 13th Gen Intel(R) Core(TM)
>>> i7-13700, 12 Core/24 HT.
>>>
>>> Based on master b86545e02e8c.
>>>
>>> Results
>>> =======
>>>
>>> hackbench-process-pipes
>>>                         vanilla                  patched
>>> Amean     1       0.5837 (   0.00%)      0.5733 (   1.77%)
>>> Amean     4       1.4423 (   0.00%)      1.4503 (  -0.55%)
>>> Amean     7       2.5147 (   0.00%)      2.4773 (   1.48%)
>>> Amean     12      3.9347 (   0.00%)      3.8880 (   1.19%)
>>> Amean     21      5.3943 (   0.00%)      5.3873 (   0.13%)
>>> Amean     30      6.7840 (   0.00%)      6.6660 (   1.74%)
>>> Amean     48      9.8313 (   0.00%)      9.6100 (   2.25%)
>>> Amean     79     15.4403 (   0.00%)     14.9580 (   3.12%)
>>> Amean     96     18.4970 (   0.00%)     17.9533 (   2.94%)
>>>
>>> hackbench-process-sockets
>>>                         vanilla                  patched
>>> Amean     1       0.6297 (   0.00%)      0.6223 (   1.16%)
>>> Amean     4       2.1517 (   0.00%)      2.0887 (   2.93%)
>>> Amean     7       3.6377 (   0.00%)      3.5670 (   1.94%)
>>> Amean     12      6.1277 (   0.00%)      5.9290 (   3.24%)
>>> Amean     21     10.0380 (   0.00%)      9.7623 (   2.75%)
>>> Amean     30     14.1517 (   0.00%)     13.7513 (   2.83%)
>>> Amean     48     24.7253 (   0.00%)     24.2287 (   2.01%)
>>> Amean     79     43.9523 (   0.00%)     43.2330 (   1.64%)
>>> Amean     96     54.5310 (   0.00%)     53.7650 (   1.40%)
>>>
>>> tbench4 Throughput
>>>                         vanilla                  patched
>>> Hmean     1       255.97 (   0.00%)      275.01 (   7.44%)
>>> Hmean     2       511.60 (   0.00%)      544.27 (   6.39%)
>>> Hmean     4       996.70 (   0.00%)     1006.57 (   0.99%)
>>> Hmean     8      1646.46 (   0.00%)     1649.15 (   0.16%)
>>> Hmean     16     2259.42 (   0.00%)     2274.35 (   0.66%)
>>> Hmean     32     4725.48 (   0.00%)     4735.57 (   0.21%)
>>> Hmean     64     4411.47 (   0.00%)     4400.05 (  -0.26%)
>>> Hmean     96     4284.31 (   0.00%)     4267.39 (  -0.39%)
>>>
>>> Signed-off-by: Hao Jia <jiahao1@...iang.com>
>>> Suggested-by: Peter Zijlstra (Intel) <peterz@...radead.org>
>>> ---
>>> Previous discussion link: https://lore.kernel.org/all/20241128084858.25220-1-jiahao.kernel@gmail.com
>>> Link to v1: https://lore.kernel.org/all/20241218080203.80556-1-jiahao.kernel@gmail.com
>>>
>>> v1 to v2:
>>>    - Modify dst_cfs_rq->nr_running to dst_cfs_rq->nr_queued to
>>>      resolve conflicts with commit 736c55a02c47 ("sched/fair:
>>>      Rename cfs_rq.nr_running into nr_queued").
>>>
>>>    kernel/sched/fair.c | 34 ++++++++++++++++++++++++++++++++++
>>>    1 file changed, 34 insertions(+)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 5599b0c1ba9b..c884bf631e66 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -9396,6 +9396,30 @@ static inline int migrate_degrades_locality(struct task_struct *p,
>>>    }
>>>    #endif
>>>
>>> +/*
>>> + * Check whether the task is ineligible on the destination cpu
>>> + *
>>> + * When the PLACE_LAG scheduling feature is enabled and
>>> + * dst_cfs_rq->nr_queued is greater than 1, if the task
>>> + * is ineligible, it will also be ineligible when
>>> + * it is migrated to the destination cpu.
>>> + */
>>> +static inline int task_is_ineligible_on_dst_cpu(struct task_struct *p, int dest_cpu)
>>> +{
>>> +     struct cfs_rq *dst_cfs_rq;
>>> +
>>> +#ifdef CONFIG_FAIR_GROUP_SCHED
>>> +     dst_cfs_rq = task_group(p)->cfs_rq[dest_cpu];
>>> +#else
>>> +     dst_cfs_rq = &cpu_rq(dest_cpu)->cfs;
>>> +#endif
>>> +     if (sched_feat(PLACE_LAG) && dst_cfs_rq->nr_queued &&
>>> +         !entity_eligible(task_cfs_rq(p), &p->se))
>>> +             return 1;
>>> +
>>> +     return 0;
>>> +}
>>> +
>>>    /*
>>>     * can_migrate_task - may task p from runqueue rq be migrated to this_cpu?
>>>     */
>>> @@ -9420,6 +9444,16 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
>>>        if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
>>>                return 0;
>>>
>>> +     /*
>>> +      * We want to prioritize the migration of eligible tasks.
>>> +      * For ineligible tasks we soft-limit them and only allow
>>> +      * them to migrate when nr_balance_failed is non-zero to
>>> +      * avoid load-balancing trying very hard to balance the load.
>>> +      */
>>> +     if (!env->sd->nr_balance_failed &&
>>> +         task_is_ineligible_on_dst_cpu(p, env->dst_cpu))
>>> +             return 0;
>>> +
>>>        /* Disregard percpu kthreads; they are where they need to be. */
>>>        if (kthread_is_per_cpu(p))
>>>                return 0;