linux-kernel - Re: [RFC PATCH] sched/core: Fix premature p->migration

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <jhj4kj028n2.mognet@arm.com>
Date:   Thu, 28 Jan 2021 18:56:01 +0000
From:   Valentin Schneider <valentin.schneider@....com>
To:     Tao Zhou <ouwen210@...mail.com>
Cc:     linux-kernel@...r.kernel.org, tglx@...utronix.de, mingo@...nel.org,
        bigeasy@...utronix.de, qais.yousef@....com, swood@...hat.com,
        peterz@...radead.org, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com, vincent.donnefort@....com, tj@...nel.org,
        ouwen210@...mail.com
Subject: Re: [RFC PATCH] sched/core: Fix premature p->migration_pending completion

On 29/01/21 01:02, Tao Zhou wrote:
> Hi,
>
> On Wed, Jan 27, 2021 at 07:30:35PM +0000, Valentin Schneider wrote:
>
>> Fiddling some more with a TLA+ model of set_cpus_allowed_ptr() & friends
>> unearthed one more outstanding issue. This doesn't even involve
>> migrate_disable(), but rather affinity changes and execution of the stopper
>> racing with each other.
>>
>> My own interpretation of the (lengthy) TLA+ splat (note the potential for
>> errors at each level) is:
>>
>>   Initial conditions:
>>     victim.cpus_mask = {CPU0, CPU1}
>>
>>   CPU0                             CPU1                             CPU<don't care>
>>
>>   switch_to(victim)
>>                                                                  set_cpus_allowed(victim, {CPU1})
>>                                                                    kick CPU0 migration_cpu_stop({.dest_cpu = CPU1})
>>   switch_to(stopper/0)
>>                                                                  // e.g. CFS load balance
>>                                                                  move_queued_task(CPU0, victim, CPU1);
>                                     ^^^^^^^^^^^^^^^^
>
> Why is move_queued_task() not attach_task()/detach_task() for CFS load..
>

Heh I expected that one; it is indeed detach_task()/attach_task() for CFS
LB. I didn't want to make this any longer than it needed to, and I figured
that move_queued_task() being a composition of detach_task(), attach_task() and
rq_locks, this would get the point across.

This does raise an "interesting" point that ATM I think this issue cannot
actually involve move_queued_task(), since all current move_queued_task()
callsites are issued either from a stopper or from set_cpus_allowed_ptr().

CFS' detach_task() + attach_task() could do it, though.

>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 06b449942adf..b57326b0a742 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -1923,20 +1923,28 @@ static int migration_cpu_stop(void *data)
>>                      complete = true;
>>              }
>>
>> -		/* migrate_enable() --  we must not race against SCA */
>> -		if (dest_cpu < 0) {
>> -			/*
>> -			 * When this was migrate_enable() but we no longer
>> -			 * have a @pending, a concurrent SCA 'fixed' things
>> -			 * and we should be valid again. Nothing to do.
>> -			 */
>> -			if (!pending) {
>> -				WARN_ON_ONCE(!cpumask_test_cpu(task_cpu(p), &p->cpus_mask));
>> -				goto out;
>> -			}
>> +	       /*
>> +		* When this was migrate_enable() but we no longer
>> +		* have a @pending, a concurrent SCA 'fixed' things
>> +		* and we should be valid again.
>> +		*
>> +		* This can also be a stopper invocation that was 'fixed' by an
>> +		* earlier one.
>> +		*
>> +		* Nothing to do.
>> +		*/
>> +		if ((dest_cpu < 0 || dest_cpu == cpu_of(rq)) && !pending) {
>
> When the condition 'dest_cpu == cpu_of(rq)' is true, pending is not NULL.
> The condition may be like this:
>
>     if ((dest_cpu < 0 && !pending) || dest_cpu == cpu_of(rq))
>
> We want to choose one cpu in the new(currently modified) cpu_mask and
> complete all.
>

Consider the execution of migration_cpu_stop() in above trace with
migrate_task_to(). We do have:
- dest_cpu == cpu_of(rq)
- p->migration_pending

but we do *not* want to bail out at this condition, because we need to fix
up dest_cpu.