[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.11.1409030846310.9372@gentwo.org>
Date: Wed, 3 Sep 2014 09:10:24 -0500 (CDT)
From: Christoph Lameter <cl@...ux.com>
To: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
cc: Frederic Weisbecker <fweisbec@...il.com>,
linux-kernel@...r.kernel.org
Subject: Re: [RFC] dynticks: dynticks_idle is only modified locally use
this_cpu ops
On Tue, 2 Sep 2014, Paul E. McKenney wrote:
> On Tue, Sep 02, 2014 at 06:22:52PM -0500, Christoph Lameter wrote:
> > On Tue, 2 Sep 2014, Paul E. McKenney wrote:
> >
> > > Yep, these two have been on my "when I am feeling insanely gutsy" list
> > > for quite some time.
> > >
> > > But I have to ask... On x86, is a pair of mfence instructions really
> > > cheaper than an atomic increment?
> >
> > Not sure why you would need an mfence instruction?
>
> Because otherwise RCU can break. As soon as the grace-period machinery
> sees that the value of this variable is even, it assumes a quiescent
> state. If there are no memory barriers, the non-quiescent code might
> not have completed executing, and your kernel's actuarial statistics
> become sub-optimal.
Synchronization using per cpu variables is bound to be problematic since
they are simply not made for that. The per cpu variable usually can change
without notice to the other cpu since typically per cpu processing is
ongoing. The improvided performance of per cpu instructions is
possible only because we exploit the fact that there is no need for
synchronization.
Kernel statistics *are* suboptimal for that very reason because they
typically sum up individual counters from multiple processors without
regard to complete accuracy. The manipulation of the VM counters is very
low overhead due to the lack of concern for synchronization. This is a
tradeoff vs. performance. We actually can tune the the fuzziness of
statistics in the VM which allows us to control the overhead generated by
the need for more or less accurate statistics.
Memory barriers ensure that the code has completed executing? I think what
is meant is that they ensure that all modifications to cachelines before the
change of state are visible and the other processor does not have stale
cachelines around?
If the state variable is odd how does the other processor see a state
change to even before processing is complete if the state is updated only
at the end of processing?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists