lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Wed, 21 Jul 2021 14:25:15 -0700 From: "Paul E. McKenney" <paulmck@...nel.org> To: Linus Torvalds <torvalds@...ux-foundation.org> Cc: rcu@...r.kernel.org, Linux Kernel Mailing List <linux-kernel@...r.kernel.org>, Kernel Team <kernel-team@...com>, Ingo Molnar <mingo@...nel.org>, Lai Jiangshan <jiangshanlai@...il.com>, Andrew Morton <akpm@...ux-foundation.org>, Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, Josh Triplett <josh@...htriplett.org>, Thomas Gleixner <tglx@...utronix.de>, Peter Zijlstra <peterz@...radead.org>, Steven Rostedt <rostedt@...dmis.org>, David Howells <dhowells@...hat.com>, Eric Dumazet <edumazet@...gle.com>, Frédéric Weisbecker <fweisbec@...il.com>, Oleg Nesterov <oleg@...hat.com>, Joel Fernandes <joel@...lfernandes.org> Subject: Re: [PATCH rcu 04/18] rcu: Weaken ->dynticks accesses and updates On Wed, Jul 21, 2021 at 01:41:46PM -0700, Linus Torvalds wrote: > Hmm. > > This actually seems to make some of the ordering worse. > > I'm not seeing a lot of weakening or optimization, but it depends a > bit on what is common and what is not. Agreed, and I expect that I will be reworking this patch rather thoroughly. Something about smp_mb() often being a locked atomic operation on a stack location. :-/ But you did ask for this to be sped up some years back (before the memory model was formalized), so I figured I should at least show what can be done. Plus I expect that you know much more about what Intel is planning than I do. > On Wed, Jul 21, 2021 at 1:21 PM Paul E. McKenney <paulmck@...nel.org> wrote: > > > > +/* > > + * Increment the current CPU's rcu_data structure's ->dynticks field > > + * with ordering. Return the new value. > > + */ > > +static noinstr unsigned long rcu_dynticks_inc(int incby) > > +{ > > + struct rcu_data *rdp = this_cpu_ptr(&rcu_data); > > + int seq; > > + > > + seq = READ_ONCE(rdp->dynticks) + incby; > > + smp_store_release(&rdp->dynticks, seq); > > + smp_mb(); // Fundamental RCU ordering guarantee. > > + return seq; > > +} > > So this is actually likely *more* expensive than the old code was, at > least on x86. > > The READ_ONCE/smp_store_release are cheap, but then the smp_mb() is expensive. > > The old code did just arch_atomic_inc_return(), which included the > memory barrier. > > There *might* be some cache ordering advantage to letting the > READ_ONCE() float upwards, but from a pure barrier standpoint this is > more expensive than what we used to have. No argument here. > > - if (atomic_read(&rdp->dynticks) & 0x1) > > + if (READ_ONCE(rdp->dynticks) & 0x1) > > return; > > - atomic_inc(&rdp->dynticks); > > + rcu_dynticks_inc(1); > > And this one seems to not take advantage of the new rule, so we end up > having two reads, and then that potentially more expensive sequence. This one only executes when a CPU comes online, so I am not worried about its overhead. > > static int rcu_dynticks_snap(struct rcu_data *rdp) > > { > > - return atomic_add_return(0, &rdp->dynticks); > > + smp_mb(); // Fundamental RCU ordering guarantee. > > + return smp_load_acquire(&rdp->dynticks); > > } > > This is likely cheaper - not because of barriers, but simply because > it avoids dirtying the cacheline. > > So which operation do we _care_ about, and do we have numbers for why > this improves anything? Because looking at the patch, it's not obvious > that this is an improvement. It sounds like I should keep this hunk and revert the rest back to atomic operations, but still in the new rcu_dynticks_inc() function. Either way, thank you for looking this over! Thanx, Paul
Powered by blists - more mailing lists