[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190905205556.GB4125@linux.ibm.com>
Date: Thu, 5 Sep 2019 13:55:56 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Oleg Nesterov <oleg@...hat.com>,
Russell King - ARM Linux admin <linux@...linux.org.uk>,
Chris Metcalf <cmetcalf@...hip.com>,
Christoph Lameter <cl@...ux.com>,
Kirill Tkhai <tkhai@...dex.ru>, Mike Galbraith <efault@....de>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>,
Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
Davidlohr Bueso <dave@...olabs.net>, mpe@...erman.id.au
Subject: Re: [PATCH 2/3] task: RCU protect tasks on the runqueue
On Thu, Sep 05, 2019 at 03:02:49PM -0500, Eric W. Biederman wrote:
> "Paul E. McKenney" <paulmck@...nel.org> writes:
>
> > On Tue, Sep 03, 2019 at 10:06:03PM +0200, Peter Zijlstra wrote:
> >> On Tue, Sep 03, 2019 at 12:18:47PM -0700, Linus Torvalds wrote:
> >> > Now, if you can point to some particular field where that ordering
> >> > makes sense for the particular case of "make it active on the
> >> > runqueue" vs "look up the task from the runqueue using RCU", then I do
> >> > think that the whole release->acquire consistency makes sense.
> >> >
> >> > But it's not clear that such a field exists, particularly when this is
> >> > in no way the *common* way to even get a task pointer, and other paths
> >> > do *not* use the runqueue as the serialization point.
> >>
> >> Even if we could find a case (and I'm not seeing one in a hurry), I
> >> would try really hard to avoid adding extra barriers here and instead
> >> make the consumer a little more complicated if at all possible.
> >>
> >> The Power folks got rid of a SYNC (yes, more expensive than LWSYNC) from
> >> their __switch_to() implementation and that had a measurable impact.
> >>
> >> 9145effd626d ("powerpc/64: Drop explicit hwsync in context switch")
> >
> > The patch [1] looks good to me. And yes, if the structure pointed to by
> > the second argument of rcu_assign_pointer() is already visible to readers,
> > it is OK to instead use RCU_INIT_POINTER(). Yes, this loses ordering.
> > But weren't these simple assignments before RCU got involved?
> >
> > As a very rough rule of thumb, LWSYNC is about twice as fast as SYNC.
> > (Depends on workload, exact details of the hardware, timing, phase of
> > the moon, you name it.) So removing the LWSYNC is likely to provide
> > measureable benefit, but I must defer to the powerpc maintainers.
> > To that end, I added Michael on CC.
> >
> > [1] https://lore.kernel.org/lkml/878sr6t21a.fsf_-_@x220.int.ebiederm.org/
>
> Paul, what is the purpose of the barrier in rcu_assign_pointer?
>
> My intuition says it is the assignment half of rcu_dereference, and that
> anything that rcu_dereference does not need is too strong.
>
> My basic understanding is that only alpha ever has the memory ordering
> issue that rcu_dereference deals with. So I am a bit surprised that
> this is anything other than a noop for anything except alpha.
Yes, only Alpha needs an actual memory-barrier instruction in
rcu_dereference(). And it is the only one that gets one.
> In my patch I used rcu_assign_pointer because that is the canonically
> correct way to do things. Peter makes a good case that adding an extra
> barrier in ___schedule could be detrimental to system performance.
> At the same time if there is a correctness issue on alpha that we have
> been overlooking because of low testing volume on alpha I don't want to
> just let this slide and have very subtle bugs.
But rcu_assign_pointer() needs a memory-barrier instruction on the
weakly ordered architectures: ARM, powerpc, Itanium, and so on.
> The practical concern is that people have been really wanting to do
> lockless and rcu operations on tasks in the runqueue for a while and
> there are several very clever pieces of code doing that now. By
> changing the location of the rcu put I am trying to make these uses
> ordinary rcu uses.
>
> The uses in question are the pieces of code I update in:
> https://lore.kernel.org/lkml/8736het20c.fsf_-_@x220.int.ebiederm.org/
The rcu_assign_pointer() at the end of rcuwait_wait_event(), right?
> In short. Why is rcu_assign_pointer() more than WRITE_ONCE() on
> anything but alpha? Do other architectures need more? Is the trade off
> worth it if we avoid using rcu_assign_pointer on performance critical
> paths.
Note the difference between the read-side rcu_dereference(), which
does not require any memory-barrier instructions except on Alpha,
and the update-side rcu_assign_pointer() which does require a
memory-barrier instruction on quite a few architectures. Even on
the strongly ordered architectures (x86, s390, ...) a compiler
barrier() is required for rcu_assign_pointer().
Except note the exceptional cases where RCU_INIT_POINTER() may be
used in place of rcu_assign_pointer(), which are called out in
RCU_INIT_POINTER()'s header comment:
* Initialize an RCU-protected pointer in special cases where readers
* do not need ordering constraints on the CPU or the compiler. These
* special cases are:
*
* 1. This use of RCU_INIT_POINTER() is NULLing out the pointer *or*
* 2. The caller has taken whatever steps are required to prevent
* RCU readers from concurrently accessing this pointer *or*
* 3. The referenced data structure has already been exposed to
* readers either at compile time or via rcu_assign_pointer() *and*
*
* a. You have not made *any* reader-visible changes to
* this structure since then *or*
* b. It is OK for readers accessing this structure from its
* new location to see the old state of the structure. (For
* example, the changes were to statistical counters or to
* other state where exact synchronization is not required.)
*
* Failure to follow these rules governing use of RCU_INIT_POINTER() will
* result in impossible-to-diagnose memory corruption. As in the structures
* will look OK in crash dumps, but any concurrent RCU readers might
* see pre-initialized values of the referenced data structure. So
* please be very careful how you use RCU_INIT_POINTER()!!!
If current is already visible (which it should be unless
rcuwait_wait_event() is invoked at task-creation time), then
RCU_INIT_POINTER() could be used in rcuwait_wait_event().
> Eric
>
> p.s. I am being slow at working through all of this as I am dealing
> with my young baby son, and busy packing for the conference.
Congratulations on the baby son!!! I remember those times well, but
they were more than three decades ago for my oldest. ;-)
> I might not be able to get back to this discussion until after
> I have landed in Lisbon on Saturday night.
Looking forward to it!
Thanx, Paul
Powered by blists - more mailing lists