linux-kernel - Re: [PATCH 2/2] smp_call_function: use rwlocks on queues rather than rcu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:	Wed, 27 Aug 2008 08:16:51 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Christoph Lameter <cl@...ux-foundation.org>,
	Pekka Enberg <penberg@...helsinki.fi>,
	Ingo Molnar <mingo@...e.hu>,
	Jeremy Fitzhardinge <jeremy@...p.org>,
	Nick Piggin <nickpiggin@...oo.com.au>,
	Andi Kleen <andi@...stfloor.org>,
	"Pallipadi, Venkatesh" <venkatesh.pallipadi@...el.com>,
	Suresh Siddha <suresh.b.siddha@...el.com>,
	Jens Axboe <jens.axboe@...cle.com>,
	Rusty Russell <rusty@...tcorp.com.au>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] smp_call_function: use rwlocks on queues rather
	than rcu

On Tue, Aug 26, 2008 at 04:07:00PM +0200, Peter Zijlstra wrote:
> On Tue, 2008-08-26 at 06:43 -0700, Paul E. McKenney wrote:
> > On Mon, Aug 25, 2008 at 05:51:32PM +0200, Peter Zijlstra wrote:
> > > On Mon, 2008-08-25 at 10:46 -0500, Christoph Lameter wrote:
> > > > Peter Zijlstra wrote:
> > > > >
> > > > > If we combine these two cases, and flip the counter as soon as we've
> > > > > enqueued one callback, unless we're already waiting for a grace period
> > > > > to end - which gives us a longer window to collect callbacks.
> > > > > 
> > > > > And then the rcu_read_unlock() can do:
> > > > > 
> > > > >   if (dec_and_zero(my_counter) && my_index == dying)
> > > > >     raise_softirq(RCU)
> > > > > 
> > > > > to fire off the callback stuff.
> > > > > 
> > > > > /me ponders - there must be something wrong with that...
> > > > > 
> > > > > Aaah, yes, the dec_and_zero is non trivial due to the fact that its a
> > > > > distributed counter. Bugger..
> > > > 
> > > > Then lets make it per cpu. If we get the cpu ops in then dec_and_zero would be
> > > > very cheap.
> > > 
> > > Hmm, perhaps that might work for classic RCU, as that disables
> > > preemption and thus the counters should always be balanced.
> > 
> > Unless you use a pair of global counters (like QRCU), you will still
> > need to check a large number of counters for zero.  I suppose that one
> > approach would be to do something like QRCU, but with some smallish
> > number of counter pairs, each of which is shared by a moderate group of
> > CPUs.  For example, for 4,096 CPUs, use 64 pairs of counters, each
> > shared by 64 CPUs.  My guess is that the rcu_read_lock() overhead would
> > make this be a case of "Holy overhead, Batman!!!", but then again, I
> > cannot claim to be an expert on 4,096-CPU machines.
> 
> right - while the local count will be balanced and will always end up on
> zero, you have to check remote counts for zero as well.
> 
> But after a counter flip, the dying counter will only reach zero once
> per cpu.

Yep.

> So each cpu gets to tickle a softirq once per cycle. That softirq can
> then check all remote counters, and kick off the callback list when it
> finds them all zero.

Which might not be so good from a powertop viewpoint, since grace
periods only take a few milliseconds, and powertop likes to keep CPUs
sleeping for seconds.  But one could designate a set of CPUs that
scanned nearby counters -- carefully chosed WRT the system in question
to avoid preventing cores from being powered down.  :-/

> Of course, this scan is very expensive, n^2 at worst, each cpu
> triggering a full scan, until finally the last cpu is done.

One could keep an index of already-scanned counters, which would
help keep things down to a dull roar.

> We could optimize this by keeping cpu masks of cpus found to have !0
> counts - those who were found to have 0, will always stay zero, so we'll
> not have to look at them again.

OK, a mask would work, though an index would be faster.

> Another is making use of a scanning hierarchy.

You still have to scan all of the leaves...  Though you can divide the
work over a set of CPUs, again, as long as those CPUs are chosen so as
to balance the need to power down whole cores with the need to maintain
good memory locality.

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/