[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <xhsmh7cvgnbdf.mognet@vschneid.remote.csb>
Date: Thu, 16 Mar 2023 12:12:28 +0000
From: Valentin Schneider <vschneid@...hat.com>
To: Yicong Yang <yangyicong@...wei.com>, mingo@...hat.com,
peterz@...radead.org, juri.lelli@...hat.com,
vincent.guittot@...aro.org, linux-kernel@...r.kernel.org
Cc: yangyicong@...ilicon.com, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
bristot@...hat.com, linuxarm@...wei.com, prime.zeng@...wei.com,
wangjie125@...wei.com
Subject: Re: [PATCH] sched/fair: Don't balance migration disabled tasks
On 16/03/23 17:13, Yicong Yang wrote:
> Hi Valentin,
>
> On 2023/3/15 23:34, Valentin Schneider wrote:
>> That cpumask check should cover migration_disabled tasks, unless they
>> haven't gone through migrate_disable_switch() yet
>> (p->migration_disabled == 1, but the cpus_ptr hasn't been touched yet).
>>
>> Now, if that's the case, the task has to be src_rq's current (since it
>> hasn't switched out), which means can_migrate_task() should exit via:
>>
>> if (task_on_cpu(env->src_rq, p)) {
>> schedstat_inc(p->stats.nr_failed_migrations_running);
>> return 0;
>> }
>>
>> and thus not try to detach_task(). With that in mind, I don't get how your
>> splat can happen, nor how the change change can help (a remote task p could
>> execute migrate_disable() concurrently with can_migrate_task(p)).
>>
>
> I see, for migrate disabled tasks, if !p->on_cpu the migration can be avoid by
> the cpus_ptr check and if p->on_cpu migration can be avoid by the task_on_cpu()
> check. So this patch won't help.
>
Right.
>> I'm a bit confused here, detach_tasks() happens entirely with src_rq
>> rq_lock held, so there shouldn't be any surprises.
>>
>
> Since it's a arm64 machine, could the WARN_ON_ONCE() test be false positive?
> I mean the update of p->migration_disabled is not observed by the balance
> CPU and trigger this warning incorrectly.
>
Since the balance CPU holds the src_rq's rq_lock for the entire duration of
detach_tasks(), the actual value of p->migration_disabled shouldn't matter.
Either p has been scheduled out while being migration_disabled, in which
case its ->cpus_ptr has been updated, or p is still running, in which case
can_migrate_task() should return false (p->on_cpu). But from your report,
we're somehow not seeing one of these.
>> Can you share any extra context? E.g. exact HEAD of your tree, maybe the
>> migrate_disable task in question if you have that info.
>>
>
> We met this on our internal version based on 6.1-rc4, the scheduler is not
> patched. We also saw this in previous version like 6.0. This patch is applied
> since 6.2 so we haven't seen this recently, but as you point out this patch doesn't
> solve the problem.
Could you try to reproduce this on an unpatched upstream kernel?
Powered by blists - more mailing lists