linux-kernel - Re: [PATCH v2] sched/core: Prioritize migrating eligible tasks in sched_balance

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtAf2dJmdDxnznA9GH2DOp2C0FKkmyMQwFjzOFiXNq7YJA@mail.gmail.com>
Date: Mon, 13 Jan 2025 17:40:23 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Hao Jia <jiahao.kernel@...il.com>
Cc: mingo@...hat.com, peterz@...radead.org, mingo@...nel.org, 
	juri.lelli@...hat.com, dietmar.eggemann@....com, rostedt@...dmis.org, 
	bsegall@...gle.com, mgorman@...e.de, vschneid@...hat.com, 
	linux-kernel@...r.kernel.org, Hao Jia <jiahao1@...iang.com>
Subject: Re: [PATCH v2] sched/core: Prioritize migrating eligible tasks in sched_balance_rq()

On Mon, 13 Jan 2025 at 10:21, Hao Jia <jiahao.kernel@...il.com> wrote:
>
> Friendly ping...
>
>
> On 2024/12/23 17:14, Hao Jia wrote:
> > From: Hao Jia <jiahao1@...iang.com>
> >
> > When the PLACE_LAG scheduling feature is enabled and
> > dst_cfs_rq->nr_queued is greater than 1, if a task is
> > ineligible (lag < 0) on the source cpu runqueue, it will
> > also be ineligible when it is migrated to the destination
> > cpu runqueue. Because we will keep the original equivalent
> > lag of the task in place_entity(). So if the task was
> > ineligible before, it will still be ineligible after
> > migration.
> >
> > So in sched_balance_rq(), we prioritize migrating eligible
> > tasks, and we soft-limit ineligible tasks, allowing them
> > to migrate only when nr_balance_failed is non-zero to
> > avoid load-balancing trying very hard to balance the load.

Could you explain why you think it's better to balance eligible tasks
in priority and potentially skip a load balance ?

I can see an interest for idle and newly_idle load balance in order to
favor fairness as tasks will become eligible but I don't see why it
would be helpful if dst already has some runnable tasks. Furthermore,
when a cpu is idle or newly idle, we really want to migrate a task
even an non eligible one instead of possibly skipping this load
balance round. With your patch, we might end up not pulling any task,
increasing the nr_balance_failed and waiting next load balance

> >
> > Below are some benchmark test results. From my test results,
> > this patch shows a slight improvement on hackbench.
> >
> > Benchmark
> > =========
> >
> > All of the benchmarks are done inside a normal cpu cgroup in a
> > clean environment with cpu turbo disabled, and test machine is:
> >
> > Single NUMA machine model is 13th Gen Intel(R) Core(TM)
> > i7-13700, 12 Core/24 HT.
> >
> > Based on master b86545e02e8c.
> >
> > Results
> > =======
> >
> > hackbench-process-pipes
> >                        vanilla                  patched
> > Amean     1       0.5837 (   0.00%)      0.5733 (   1.77%)
> > Amean     4       1.4423 (   0.00%)      1.4503 (  -0.55%)
> > Amean     7       2.5147 (   0.00%)      2.4773 (   1.48%)
> > Amean     12      3.9347 (   0.00%)      3.8880 (   1.19%)
> > Amean     21      5.3943 (   0.00%)      5.3873 (   0.13%)
> > Amean     30      6.7840 (   0.00%)      6.6660 (   1.74%)
> > Amean     48      9.8313 (   0.00%)      9.6100 (   2.25%)
> > Amean     79     15.4403 (   0.00%)     14.9580 (   3.12%)
> > Amean     96     18.4970 (   0.00%)     17.9533 (   2.94%)
> >
> > hackbench-process-sockets
> >                        vanilla                  patched
> > Amean     1       0.6297 (   0.00%)      0.6223 (   1.16%)
> > Amean     4       2.1517 (   0.00%)      2.0887 (   2.93%)
> > Amean     7       3.6377 (   0.00%)      3.5670 (   1.94%)
> > Amean     12      6.1277 (   0.00%)      5.9290 (   3.24%)
> > Amean     21     10.0380 (   0.00%)      9.7623 (   2.75%)
> > Amean     30     14.1517 (   0.00%)     13.7513 (   2.83%)
> > Amean     48     24.7253 (   0.00%)     24.2287 (   2.01%)
> > Amean     79     43.9523 (   0.00%)     43.2330 (   1.64%)
> > Amean     96     54.5310 (   0.00%)     53.7650 (   1.40%)
> >
> > tbench4 Throughput
> >                        vanilla                  patched
> > Hmean     1       255.97 (   0.00%)      275.01 (   7.44%)
> > Hmean     2       511.60 (   0.00%)      544.27 (   6.39%)
> > Hmean     4       996.70 (   0.00%)     1006.57 (   0.99%)
> > Hmean     8      1646.46 (   0.00%)     1649.15 (   0.16%)
> > Hmean     16     2259.42 (   0.00%)     2274.35 (   0.66%)
> > Hmean     32     4725.48 (   0.00%)     4735.57 (   0.21%)
> > Hmean     64     4411.47 (   0.00%)     4400.05 (  -0.26%)
> > Hmean     96     4284.31 (   0.00%)     4267.39 (  -0.39%)
> >
> > Signed-off-by: Hao Jia <jiahao1@...iang.com>
> > Suggested-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> > ---
> > Previous discussion link: https://lore.kernel.org/all/20241128084858.25220-1-jiahao.kernel@gmail.com
> > Link to v1: https://lore.kernel.org/all/20241218080203.80556-1-jiahao.kernel@gmail.com
> >
> > v1 to v2:
> >   - Modify dst_cfs_rq->nr_running to dst_cfs_rq->nr_queued to
> >     resolve conflicts with commit 736c55a02c47 ("sched/fair:
> >     Rename cfs_rq.nr_running into nr_queued").
> >
> >   kernel/sched/fair.c | 34 ++++++++++++++++++++++++++++++++++
> >   1 file changed, 34 insertions(+)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 5599b0c1ba9b..c884bf631e66 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -9396,6 +9396,30 @@ static inline int migrate_degrades_locality(struct task_struct *p,
> >   }
> >   #endif
> >
> > +/*
> > + * Check whether the task is ineligible on the destination cpu
> > + *
> > + * When the PLACE_LAG scheduling feature is enabled and
> > + * dst_cfs_rq->nr_queued is greater than 1, if the task
> > + * is ineligible, it will also be ineligible when
> > + * it is migrated to the destination cpu.
> > + */
> > +static inline int task_is_ineligible_on_dst_cpu(struct task_struct *p, int dest_cpu)
> > +{
> > +     struct cfs_rq *dst_cfs_rq;
> > +
> > +#ifdef CONFIG_FAIR_GROUP_SCHED
> > +     dst_cfs_rq = task_group(p)->cfs_rq[dest_cpu];
> > +#else
> > +     dst_cfs_rq = &cpu_rq(dest_cpu)->cfs;
> > +#endif
> > +     if (sched_feat(PLACE_LAG) && dst_cfs_rq->nr_queued &&
> > +         !entity_eligible(task_cfs_rq(p), &p->se))
> > +             return 1;
> > +
> > +     return 0;
> > +}
> > +
> >   /*
> >    * can_migrate_task - may task p from runqueue rq be migrated to this_cpu?
> >    */
> > @@ -9420,6 +9444,16 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
> >       if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
> >               return 0;
> >
> > +     /*
> > +      * We want to prioritize the migration of eligible tasks.
> > +      * For ineligible tasks we soft-limit them and only allow
> > +      * them to migrate when nr_balance_failed is non-zero to
> > +      * avoid load-balancing trying very hard to balance the load.
> > +      */
> > +     if (!env->sd->nr_balance_failed &&
> > +         task_is_ineligible_on_dst_cpu(p, env->dst_cpu))
> > +             return 0;
> > +
> >       /* Disregard percpu kthreads; they are where they need to be. */
> >       if (kthread_is_per_cpu(p))
> >               return 0;