lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 16 Mar 2023 17:13:54 +0800
From:   Yicong Yang <yangyicong@...wei.com>
To:     Valentin Schneider <vschneid@...hat.com>, <mingo@...hat.com>,
        <peterz@...radead.org>, <juri.lelli@...hat.com>,
        <vincent.guittot@...aro.org>, <linux-kernel@...r.kernel.org>
CC:     <yangyicong@...ilicon.com>, <dietmar.eggemann@....com>,
        <rostedt@...dmis.org>, <bsegall@...gle.com>, <mgorman@...e.de>,
        <bristot@...hat.com>, <linuxarm@...wei.com>,
        <prime.zeng@...wei.com>, <wangjie125@...wei.com>
Subject: Re: [PATCH] sched/fair: Don't balance migration disabled tasks

Hi Valentin,

On 2023/3/15 23:34, Valentin Schneider wrote:
> On 13/03/23 14:57, Yicong Yang wrote:
>>  kernel/sched/fair.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 7a1b1f855b96..8fe767362d22 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -8433,6 +8433,10 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
>>       if (kthread_is_per_cpu(p))
>>               return 0;
>>
>> +	/* Migration disabled tasks need to be kept on their running CPU. */
>> +	if (is_migration_disabled(p))
>> +		return 0;
>> +
>>       if (!cpumask_test_cpu(env->dst_cpu, p->cpus_ptr)) {
>>               int cpu;
> 
> That cpumask check should cover migration_disabled tasks, unless they
> haven't gone through migrate_disable_switch() yet
> (p->migration_disabled == 1, but the cpus_ptr hasn't been touched yet).
> 
> Now, if that's the case, the task has to be src_rq's current (since it
> hasn't switched out), which means can_migrate_task() should exit via:
> 
>         if (task_on_cpu(env->src_rq, p)) {
>                 schedstat_inc(p->stats.nr_failed_migrations_running);
>                 return 0;
>         }
> 
> and thus not try to detach_task(). With that in mind, I don't get how your
> splat can happen, nor how the change change can help (a remote task p could
> execute migrate_disable() concurrently with can_migrate_task(p)).
> 

I see, for migrate disabled tasks, if !p->on_cpu the migration can be avoid by
the cpus_ptr check and if p->on_cpu migration can be avoid by the task_on_cpu()
check. So this patch won't help.

> I'm a bit confused here, detach_tasks() happens entirely with src_rq
> rq_lock held, so there shouldn't be any surprises.
> 

Since it's a arm64 machine, could the WARN_ON_ONCE() test be false positive?
I mean the update of p->migration_disabled is not observed by the balance
CPU and trigger this warning incorrectly.

> Can you share any extra context? E.g. exact HEAD of your tree, maybe the
> migrate_disable task in question if you have that info.
> 

We met this on our internal version based on 6.1-rc4, the scheduler is not
patched. We also saw this in previous version like 6.0. This patch is applied
since 6.2 so we haven't seen this recently, but as you point out this patch doesn't
solve the problem. The questioned tasks are some systemd services(udevd, etc)
and I assume the migration disable is caused by seccomp:

seccomp()
	...
	bpf_prog_run_pin_on_cpu()
		migrate_disable()
		...
		migrate_enable()

Thanks,
Yicong

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ