linux-kernel - Re: smp_mb__after_spinlock requirement too strong?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJDTihxxhy7zmhTJ-ky4wvMby_o9y8UOcs9R1dABN7NccekAiQ@mail.gmail.com>
Date:   Mon, 12 Mar 2018 16:18:00 +0800
From:   焦晓冬 <milestonejxd@...il.com>
To:     Boqun Feng <boqun.feng@...il.com>
Cc:     linux-kernel@...r.kernel.org, peterz@...radead.org,
        Alan Stern <stern@...land.harvard.edu>, will.deacon@....com,
        torvalds@...ux-foundation.org, npiggin@...il.com, mingo@...nel.org,
        mpe@...erman.id.au, oleg@...hat.com, benh@...nel.crashing.org,
        Paul McKenney <paulmck@...ux.vnet.ibm.com>
Subject: Re: smp_mb__after_spinlock requirement too strong?

>> Peter pointed out in this patch https://patchwork.kernel.org/patch/9771921/
>> that the spinning-lock used at __schedule() should be RCsc to ensure
>> visibility of writes prior to __schedule when the task is to be migrated to
>> another CPU.
>>
>> And this is emphasized at the comment of the newly introduced
>> smp_mb__after_spinlock(),
>>
>>  * This barrier must provide two things:
>>  *
>>  *   - it must guarantee a STORE before the spin_lock() is ordered against a
>>  *     LOAD after it, see the comments at its two usage sites.
>>  *
>>  *   - it must ensure the critical section is RCsc.
>>  *
>>  * The latter is important for cases where we observe values written by other
>>  * CPUs in spin-loops, without barriers, while being subject to scheduling.
>>  *
>>  * CPU0         CPU1            CPU2
>>  *
>>  *          for (;;) {
>>  *            if (READ_ONCE(X))
>>  *              break;
>>  *          }
>>  * X=1
>>  *          <sched-out>
>>  *                      <sched-in>
>>  *                      r = X;
>>  *
>>  * without transitivity it could be that CPU1 observes X!=0 breaks the loop,
>>  * we get migrated and CPU2 sees X==0.
>>
>> which is used at,
>>
>> __schedule(bool preempt) {
>>     ...
>>     rq_lock(rq, &rf);
>>     smp_mb__after_spinlock();
>>     ...
>> }
>> .
>>
>> If I didn't miss something, I found this kind of visibility is __not__
>> necessarily
>> depends on the spinning-lock at __schedule being RCsc.
>>
>> In fact, as for runnable task A, the migration would be,
>>
>>  CPU0         CPU1            CPU2
>>
>> <ACCESS before schedule out A>
>>
>> lock(rq0)
>> schedule out A
>> unock(rq0)
>>
>>               lock(rq0)
>>               remove A from rq0
>>               unlock(rq0)
>>
>>               lock(rq2)
>>               add A into rq2
>>               unlock(rq2)
>>                                         lock(rq2)
>>                                         schedule in A
>>                                         unlock(rq2)
>>
>>                                         <ACCESS after schedule in A>
>>
>> <ACCESS before schedule out A> happens-before
>> unlock(rq0) happends-before
>> lock(rq0) happends-before
>> unlock(rq2) happens-before
>> lock(rq2) happens-before
>> <ACCESS after schedule in A>
>>
>
> But without RCsc lock, you cannot guarantee that a write propagates to
> CPU 0 and CPU 2 at the same time, so the same write may propagate to
> CPU0 before <ACCESS before schedule out A> but propagate to CPU 2 after
> <ACCESS after scheduler in A>. So..
>
> Regards,
> Boqun

Thank you for pointing out this case, Boqun.
But this is just one special case that acquire-release chains promise us.

A=B=0 as initial

  CPU0                CPU1                CPU2                CPU3
 write A=1
                           read A=1
                           write B=1
                           release X
                                                 acquire X
                                                 read A=?
                                                 release Y

    acquire Y

    read B=?

assurance 1: CPU3 will surely see B=1 writing by CPU1, and
assurance 2: CPU2 will also see A=1 writing by CPU0 as a special case

The second assurance is both in theory and implemented by real hardware.

As for theory, the C++11 memory model, which is a potential formal model
for kernel memory model as
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0124r4.html
descripes, states that:

If a value computation A of an atomic object M happens before a value
computation B of M, and A takes its value from a side effect X on M, then
the value computed by B shall either be the value stored by X or the value
stored by a side effect Y on M, where Y follows X in the modification
order of M.

at
$1.10 rule 18, on page 14
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf

As for real hardware, Luc provided detailed test and explanation on
ARM and POWER in 5.1 Cumulative Barriers for WRC  on page 19
in this paper:

A Tutorial Introduction to the ARM and POWER Relaxed Memory Models
https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf

So, I think we may remove RCsc from smp_mb__after_spinlock which is
really confusing.

Best Regards,
Trol

>
>> And for stopped tasks,
>>
>>  CPU0         CPU1            CPU2
>>
>> <ACCESS before schedule out A>
>>
>> lock(rq0)
>> schedule out A
>> remove A from rq0
>> store-release(A->on_cpu)
>> unock(rq0)
>>
>>               load_acquire(A->on_cpu)
>>               set_task_cpu(A, 2)
>>
>>               lock(rq2)
>>               add A into rq2
>>               unlock(rq2)
>>
>>                                         lock(rq2)
>>                                         schedule in A
>>                                         unlock(rq2)
>>
>>                                         <ACCESS after schedule in A>
>>
>> <ACCESS before schedule out A> happens-before
>> store-release(A->on_cpu)  happens-before
>> load_acquire(A->on_cpu)  happens-before
>> unlock(rq2) happens-before
>> lock(rq2) happens-before
>> <ACCESS after schedule in A>
>>
>> So, I think the only requirement to smp_mb__after_spinlock is
>> to guarantee a STORE before the spin_lock() is ordered
>> against a LOAD after it. So we could remove the RCsc requirement
>> to allow more efficient implementation.
>>
>> Did I miss something or this RCsc requirement does not really matter?