[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2cf94373-4f25-4a33-a0b4-cab04031bae7@virtuozzo.com>
Date: Mon, 25 Dec 2023 16:17:49 +0100
From: Konstantin Khorenko <khorenko@...tuozzo.com>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Daniel Bristot de Oliveira
<bristot@...hat.com>, Valentin Schneider <vschneid@...hat.com>,
Alexander Atanasov <alexander.atanasov@...tuozzo.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched/fair: Do not scan non-movable tasks several times
On 18.12.2023 17:48, Vincent Guittot wrote:
> On Thu, 14 Dec 2023 at 15:42, Konstantin Khorenko
> <khorenko@...tuozzo.com> wrote:
>>
>> If busiest rq is small, nr_running < SCHED_NR_MIGRATE_BREAK and all
>> tasks are not movable, detach_tasks() should not iterate more than tasks
>> available in the busiest rq.
>>
>> Previously the (env->loop > env->loop_max) condition prevented us from
>
> It's usually better to give the commit directly when we know it :
> Before commit : b0defa7ae03e ("sched/fair: Make sure to try to detach
> at least one movable task"),
> the (env->loop > env->loop_max) condition prevented us from ...
You are definitely right, added.
>> scanning non-movable tasks more than rq size times, but after we start
>> checking the LBF_ALL_PINNED flag, the "all tasks are not movable" case
>> is under threat.
>>
>
> Fixes: b0defa7ae03e ("sched/fair: Make sure to try to detach at least
> one movable task")
Added this too, thank you.
>> Signed-off-by: Konstantin Khorenko <khorenko@...tuozzo.com>
>> ---
>> kernel/sched/fair.c | 7 ++++++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index d7a3c63a2171..faa2a765e899 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -11219,7 +11219,6 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>> .dst_rq = this_rq,
>> .dst_grpmask = group_balance_mask(sd->groups),
>> .idle = idle,
>> - .loop_break = SCHED_NR_MIGRATE_BREAK,
>> .cpus = cpus,
>> .fbq_type = all,
>> .tasks = LIST_HEAD_INIT(env.tasks),
>> @@ -11265,6 +11264,12 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>> * correctly treated as an imbalance.
>> */
>> env.loop_max = min(sysctl_sched_nr_migrate, busiest->nr_running);
>> + /*
>> + * If busiest rq is small, nr_running < SCHED_NR_MIGRATE_BREAK
>> + * and all tasks are not movable, detach_tasks() should not
>> + * iterate more than tasks available in rq.
>> + */
>> + env.loop_break = min(SCHED_NR_MIGRATE_BREAK, busiest->nr_running);
>
> Should it be after more_balance: ?
> In case we do "more_balance:" on a new_dst_cpu and it ends up that
> finally there is no more movable task as we released the lock of
> busiest rq in the meantime ?
Well, both yes and no here.
If we simply move the env.loop_break assignment after "more_balance" label,
we will break the handling LBF_NEED_BREAK case.
But you are right, the new_dst_cpu case should also not just reset loop_break to
SCHED_NR_MIGRATE_BREAK, but do the same min() calculation.
So i've added one more label before the env.loop_break = min(...) and used it in
the new_dst_cpu case.
> Also you can remove one more superfluous init of loop_break:
Yep, dropped it, thank you.
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -11361,7 +11361,6 @@ static int load_balance(int this_cpu, struct
> rq *this_rq,
> */
> if (!cpumask_subset(cpus, env.dst_grpmask)) {
> env.loop = 0;
> - env.loop_break = SCHED_NR_MIGRATE_BREAK;
> goto redo;
> }
> goto out_all_pinned;
>
>>
>> more_balance:
>> rq_lock_irqsave(busiest, &rf);
>> --
>> 2.39.3
>>
Powered by blists - more mailing lists