lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 5 Jul 2019 11:38:53 -0400 (EDT)
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Ingo Molnar <mingo@...nel.org>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        x86 <x86@...nel.org>, Nadav Amit <namit@...are.com>,
        paulmck <paulmck@...ux.ibm.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Will Deacon <will.deacon@....com>
Subject: Re: [PATCH] cpu/hotplug: Cache number of online CPUs



----- On Jul 5, 2019, at 4:49 AM, Ingo Molnar mingo@...nel.org wrote:

> * Mathieu Desnoyers <mathieu.desnoyers@...icios.com> wrote:
> 
>> ----- On Jul 4, 2019, at 6:33 PM, Thomas Gleixner tglx@...utronix.de wrote:
>> 
>> > On Thu, 4 Jul 2019, Mathieu Desnoyers wrote:
>> >> ----- On Jul 4, 2019, at 5:10 PM, Thomas Gleixner tglx@...utronix.de wrote:
>> >> >
>> >> > num_online_cpus() is racy today vs. CPU hotplug operations as
>> >> > long as you don't hold the hotplug lock.
>> >> 
>> >> Fair point, AFAIU none of the loads performed within num_online_cpus()
>> >> seem to rely on atomic nor volatile accesses. So not using a volatile
>> >> access to load the cached value should not introduce any regression.
>> >> 
>> >> I'm concerned that some code may rely on re-fetching of the cached
>> >> value between iterations of a loop. The lack of READ_ONCE() would
>> >> let the compiler keep a lifted load within a register and never
>> >> re-fetch, unless there is a cpu_relax() or a barrier() within the
>> >> loop.
>> > 
>> > If someone really wants to write code which can handle concurrent CPU
>> > hotplug operations and rely on that information, then it's probably better
>> > to write out:
>> > 
>> >     ncpus = READ_ONCE(__num_online_cpus);
>> > 
>> > explicitely along with a big fat comment.
>> > 
>> > I can't figure out why one wants to do that and how it is supposed to work,
>> > but my brain is in shutdown mode already :)
>> > 
>> > I'd rather write a proper kernel doc comment for num_online_cpus() which
>> > explains what the constraints are instead of pretending that the READ_ONCE
>> > in the inline has any meaning.
>> 
>> The other aspect I am concerned about is freedom given to the compiler
>> to perform the store to __num_online_cpus non-atomically, or the load
>> non-atomically due to memory pressure.
> 
> What connection does "memory pressure" have to what the compiler does?
> 
> Did you confuse it with "register pressure"?

My brain wires must have been crossed when I wrote that. Indeed, I mean
register pressure.

> 
>> Is that something we should be concerned about ?
> 
> Once I understand it :)
> 
>> I thought we had WRITE_ONCE and READ_ONCE to take care of that kind of
>> situation.
> 
> Store and load tearing is one of the minor properties of READ_ONCE() and
> WRITE_ONCE() - the main properties are the ordering guarantees.
> 
> Since __num_online_cpus is neither weirdly aligned nor is it written via
> constants I don't see how load/store tearing could occur. Can you outline
> such a scenario?

I'm referring to the examples provided in Documentation/core-api/atomic_ops.rst,
e.g.:

"For another example, consider the following code::

        tmp_a = a;
        do_something_with(tmp_a);
        do_something_else_with(tmp_a);

 If the compiler can prove that do_something_with() does not store to the
 variable a, then the compiler is within its rights to manufacture an
 additional load as follows::

        tmp_a = a;
        do_something_with(tmp_a);
        tmp_a = a;
        do_something_else_with(tmp_a);"

So if instead of "a" we have num_online_cpus(), the compiler would be
within its right to re-fetch the number of online cpus between the two
calls, which seems unexpected.

> 
>> The semantic I am looking for here is C11's relaxed atomics.
> 
> What does this mean?

C11 states:

"Atomic operations specifying memory_order_relaxed are  relaxed  only  with  respect
to memory ordering.  Implementations must still guarantee that any given atomic access
to a particular atomic object be indivisible with respect to all other atomic accesses
to that object."

So I am concerned that num_online_cpus() as proposed in this patch
try to access __num_online_cpus non-atomically, and without using
READ_ONCE().

Similarly, the update-side should use WRITE_ONCE(). Protecting with a mutex
does not provide mutual exclusion against concurrent readers of that variable.

Again based on Documentation/core-api/atomic_ops.rst:

"For a final example, consider the following code, assuming that the
 variable a is set at boot time before the second CPU is brought online
 and never changed later, so that memory barriers are not needed::

        if (a)
                b = 9;
        else
                b = 42;

 The compiler is within its rights to manufacture an additional store
 by transforming the above code into the following::

        b = 42;
        if (a)
                b = 9;

 This could come as a fatal surprise to other code running concurrently
 that expected b to never have the value 42 if a was zero.  To prevent
 the compiler from doing this, write something like::

        if (a)
                WRITE_ONCE(b, 9);
        else
                WRITE_ONCE(b, 42);"

If the compiler decides to manufacture additional stores when updating
the __num_online_cpus cached value, then loads could observe this
intermediate value because they fetch it without holding any mutex.

Thanks,

Mathieu

> 
> Thanks,
> 
> 	Ingo

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Powered by blists - more mailing lists