[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <824482130.8027.1562341133252.JavaMail.zimbra@efficios.com>
Date: Fri, 5 Jul 2019 11:38:53 -0400 (EDT)
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Ingo Molnar <mingo@...nel.org>
Cc: Thomas Gleixner <tglx@...utronix.de>,
linux-kernel <linux-kernel@...r.kernel.org>,
x86 <x86@...nel.org>, Nadav Amit <namit@...are.com>,
paulmck <paulmck@...ux.ibm.com>,
Peter Zijlstra <peterz@...radead.org>,
Will Deacon <will.deacon@....com>
Subject: Re: [PATCH] cpu/hotplug: Cache number of online CPUs
----- On Jul 5, 2019, at 4:49 AM, Ingo Molnar mingo@...nel.org wrote:
> * Mathieu Desnoyers <mathieu.desnoyers@...icios.com> wrote:
>
>> ----- On Jul 4, 2019, at 6:33 PM, Thomas Gleixner tglx@...utronix.de wrote:
>>
>> > On Thu, 4 Jul 2019, Mathieu Desnoyers wrote:
>> >> ----- On Jul 4, 2019, at 5:10 PM, Thomas Gleixner tglx@...utronix.de wrote:
>> >> >
>> >> > num_online_cpus() is racy today vs. CPU hotplug operations as
>> >> > long as you don't hold the hotplug lock.
>> >>
>> >> Fair point, AFAIU none of the loads performed within num_online_cpus()
>> >> seem to rely on atomic nor volatile accesses. So not using a volatile
>> >> access to load the cached value should not introduce any regression.
>> >>
>> >> I'm concerned that some code may rely on re-fetching of the cached
>> >> value between iterations of a loop. The lack of READ_ONCE() would
>> >> let the compiler keep a lifted load within a register and never
>> >> re-fetch, unless there is a cpu_relax() or a barrier() within the
>> >> loop.
>> >
>> > If someone really wants to write code which can handle concurrent CPU
>> > hotplug operations and rely on that information, then it's probably better
>> > to write out:
>> >
>> > ncpus = READ_ONCE(__num_online_cpus);
>> >
>> > explicitely along with a big fat comment.
>> >
>> > I can't figure out why one wants to do that and how it is supposed to work,
>> > but my brain is in shutdown mode already :)
>> >
>> > I'd rather write a proper kernel doc comment for num_online_cpus() which
>> > explains what the constraints are instead of pretending that the READ_ONCE
>> > in the inline has any meaning.
>>
>> The other aspect I am concerned about is freedom given to the compiler
>> to perform the store to __num_online_cpus non-atomically, or the load
>> non-atomically due to memory pressure.
>
> What connection does "memory pressure" have to what the compiler does?
>
> Did you confuse it with "register pressure"?
My brain wires must have been crossed when I wrote that. Indeed, I mean
register pressure.
>
>> Is that something we should be concerned about ?
>
> Once I understand it :)
>
>> I thought we had WRITE_ONCE and READ_ONCE to take care of that kind of
>> situation.
>
> Store and load tearing is one of the minor properties of READ_ONCE() and
> WRITE_ONCE() - the main properties are the ordering guarantees.
>
> Since __num_online_cpus is neither weirdly aligned nor is it written via
> constants I don't see how load/store tearing could occur. Can you outline
> such a scenario?
I'm referring to the examples provided in Documentation/core-api/atomic_ops.rst,
e.g.:
"For another example, consider the following code::
tmp_a = a;
do_something_with(tmp_a);
do_something_else_with(tmp_a);
If the compiler can prove that do_something_with() does not store to the
variable a, then the compiler is within its rights to manufacture an
additional load as follows::
tmp_a = a;
do_something_with(tmp_a);
tmp_a = a;
do_something_else_with(tmp_a);"
So if instead of "a" we have num_online_cpus(), the compiler would be
within its right to re-fetch the number of online cpus between the two
calls, which seems unexpected.
>
>> The semantic I am looking for here is C11's relaxed atomics.
>
> What does this mean?
C11 states:
"Atomic operations specifying memory_order_relaxed are relaxed only with respect
to memory ordering. Implementations must still guarantee that any given atomic access
to a particular atomic object be indivisible with respect to all other atomic accesses
to that object."
So I am concerned that num_online_cpus() as proposed in this patch
try to access __num_online_cpus non-atomically, and without using
READ_ONCE().
Similarly, the update-side should use WRITE_ONCE(). Protecting with a mutex
does not provide mutual exclusion against concurrent readers of that variable.
Again based on Documentation/core-api/atomic_ops.rst:
"For a final example, consider the following code, assuming that the
variable a is set at boot time before the second CPU is brought online
and never changed later, so that memory barriers are not needed::
if (a)
b = 9;
else
b = 42;
The compiler is within its rights to manufacture an additional store
by transforming the above code into the following::
b = 42;
if (a)
b = 9;
This could come as a fatal surprise to other code running concurrently
that expected b to never have the value 42 if a was zero. To prevent
the compiler from doing this, write something like::
if (a)
WRITE_ONCE(b, 9);
else
WRITE_ONCE(b, 42);"
If the compiler decides to manufacture additional stores when updating
the __num_online_cpus cached value, then loads could observe this
intermediate value because they fetch it without holding any mutex.
Thanks,
Mathieu
>
> Thanks,
>
> Ingo
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Powered by blists - more mailing lists