[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210721212515.GV4397@paulmck-ThinkPad-P17-Gen-1>
Date: Wed, 21 Jul 2021 14:25:15 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: rcu@...r.kernel.org,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Kernel Team <kernel-team@...com>,
Ingo Molnar <mingo@...nel.org>,
Lai Jiangshan <jiangshanlai@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Josh Triplett <josh@...htriplett.org>,
Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>,
Steven Rostedt <rostedt@...dmis.org>,
David Howells <dhowells@...hat.com>,
Eric Dumazet <edumazet@...gle.com>,
Frédéric Weisbecker <fweisbec@...il.com>,
Oleg Nesterov <oleg@...hat.com>,
Joel Fernandes <joel@...lfernandes.org>
Subject: Re: [PATCH rcu 04/18] rcu: Weaken ->dynticks accesses and updates
On Wed, Jul 21, 2021 at 01:41:46PM -0700, Linus Torvalds wrote:
> Hmm.
>
> This actually seems to make some of the ordering worse.
>
> I'm not seeing a lot of weakening or optimization, but it depends a
> bit on what is common and what is not.
Agreed, and I expect that I will be reworking this patch rather
thoroughly.
Something about smp_mb() often being a locked atomic operation on a
stack location. :-/
But you did ask for this to be sped up some years back (before the
memory model was formalized), so I figured I should at least show what
can be done. Plus I expect that you know much more about what Intel is
planning than I do.
> On Wed, Jul 21, 2021 at 1:21 PM Paul E. McKenney <paulmck@...nel.org> wrote:
> >
> > +/*
> > + * Increment the current CPU's rcu_data structure's ->dynticks field
> > + * with ordering. Return the new value.
> > + */
> > +static noinstr unsigned long rcu_dynticks_inc(int incby)
> > +{
> > + struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> > + int seq;
> > +
> > + seq = READ_ONCE(rdp->dynticks) + incby;
> > + smp_store_release(&rdp->dynticks, seq);
> > + smp_mb(); // Fundamental RCU ordering guarantee.
> > + return seq;
> > +}
>
> So this is actually likely *more* expensive than the old code was, at
> least on x86.
>
> The READ_ONCE/smp_store_release are cheap, but then the smp_mb() is expensive.
>
> The old code did just arch_atomic_inc_return(), which included the
> memory barrier.
>
> There *might* be some cache ordering advantage to letting the
> READ_ONCE() float upwards, but from a pure barrier standpoint this is
> more expensive than what we used to have.
No argument here.
> > - if (atomic_read(&rdp->dynticks) & 0x1)
> > + if (READ_ONCE(rdp->dynticks) & 0x1)
> > return;
> > - atomic_inc(&rdp->dynticks);
> > + rcu_dynticks_inc(1);
>
> And this one seems to not take advantage of the new rule, so we end up
> having two reads, and then that potentially more expensive sequence.
This one only executes when a CPU comes online, so I am not worried
about its overhead.
> > static int rcu_dynticks_snap(struct rcu_data *rdp)
> > {
> > - return atomic_add_return(0, &rdp->dynticks);
> > + smp_mb(); // Fundamental RCU ordering guarantee.
> > + return smp_load_acquire(&rdp->dynticks);
> > }
>
> This is likely cheaper - not because of barriers, but simply because
> it avoids dirtying the cacheline.
>
> So which operation do we _care_ about, and do we have numbers for why
> this improves anything? Because looking at the patch, it's not obvious
> that this is an improvement.
It sounds like I should keep this hunk and revert the rest back to
atomic operations, but still in the new rcu_dynticks_inc() function.
Either way, thank you for looking this over!
Thanx, Paul
Powered by blists - more mailing lists