[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <28cb1f15-137e-41f9-9b7d-fa2daa70c774@amperemail.onmicrosoft.com>
Date: Tue, 22 Jul 2025 11:26:15 +0800
From: Shijie Huang <shijie@...eremail.onmicrosoft.com>
To: Valentin Schneider <vschneid@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Huang Shijie <shijie@...amperecomputing.com>
Cc: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
patches@...erecomputing.com, cl@...ux.com, Shubhang@...amperecomputing.com,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4] sched/fair: do not scan twice in detach_tasks()
On 2025/7/21 19:25, Valentin Schneider wrote:
> On 21/07/25 11:40, Vincent Guittot wrote:
>> On Mon, 21 Jul 2025 at 04:40, Huang Shijie
>> <shijie@...amperecomputing.com> wrote:
>>> detach_tasks() uses struct lb_env.loop_max as an env.src_rq->cfs_tasks
>>> iteration count limit. It is however set without the source RQ lock held,
>>> and besides detach_tasks() can be re-invoked after releasing and
>>> re-acquiring the RQ lock per LBF_NEED_BREAK.
>>>
>>> This means that env.loop_max and the actual length of env.src_rq->cfs_tasks
>>> as observed within detach_tasks() can differ. This can cause some tasks to
>> why not setting env.loop_max only once rq lock is taken in this case ?
>>
>> side note : by default loop_max <= loop_break
>>
> I thought so too and dismissed that due to LBF_NEED_BREAK, but I guess we
> could still do something like:
>
> ---
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index b9b4bbbf0af6f..eef3a0d341661 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -11643,6 +11643,7 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq,
> .dst_grpmask = group_balance_mask(sd->groups),
> .idle = idle,
> .loop_break = SCHED_NR_MIGRATE_BREAK,
> + .loop_max = UINT_MAX,
> .cpus = cpus,
> .fbq_type = all,
> .tasks = LIST_HEAD_INIT(env.tasks),
> @@ -11681,18 +11682,19 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq,
> /* Clear this flag as soon as we find a pullable task */
> env.flags |= LBF_ALL_PINNED;
> if (busiest->nr_running > 1) {
> +more_balance:
> /*
> * Attempt to move tasks. If sched_balance_find_src_group has found
> * an imbalance but busiest->nr_running <= 1, the group is
> * still unbalanced. ld_moved simply stays zero, so it is
> * correctly treated as an imbalance.
> */
> - env.loop_max = min(sysctl_sched_nr_migrate, busiest->nr_running);
> -
> -more_balance:
> rq_lock_irqsave(busiest, &rf);
> update_rq_clock(busiest);
>
> +
> + env.loop_max = min3(env.loop_max, sysctl_sched_nr_migrate, busiest->h_nr_running);
It should be busiest->nr_running? or businest->cfs.h_nr_queued?
do you mind I create a patch based on this one? or You create an
official patch?
Thanks
Huang Shijie
Powered by blists - more mailing lists