linux-kernel - Re: [PATCH] generic dynamic per cpu refcounting

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130129195141.GO26407@google.com>
Date:	Tue, 29 Jan 2013 11:51:41 -0800
From:	Kent Overstreet <koverstreet@...gle.com>
To:	Tejun Heo <tj@...nel.org>
Cc:	Oleg Nesterov <oleg@...hat.com>, srivatsa.bhat@...ux.vnet.ibm.com,
	rusty@...tcorp.com.au, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] generic dynamic per cpu refcounting

On Tue, Jan 29, 2013 at 11:29:04AM -0800, Tejun Heo wrote:
> Hey, Kent.
> 
> On Tue, Jan 29, 2013 at 08:39:42AM -0800, Kent Overstreet wrote:
> > Oh, if this is going to be widely used I should probably have a
> > different implementation for archs that don't have atomic64_t in
> > hardware. Don't suppose you know the CONFIG_ macro to test against? I
> > couldn't find it when I was looking recently.
> 
> !CONFIG_GENERIC_ATOMIC64, I think.  I don't think it's worthwhile to
> worry about tho.  Reverting to simple atomic_t ops on !CONFIG_SMP
> should cover most relevant cases.
> 
> Very tentative review follows.
> 
> > +#define PCPU_REF_PTR		0
> > +#define PCPU_REF_NONE		1
> > +#define PCPU_REF_DEAD		2
> 
> Do we still need to distinguish between NONE and DEAD?  It would be
> nice if we can just test against NULL.

With dynamic allocation, yes - we can't alloc percpu counters if it's
dead.

> > +/**
> > + * percpu_ref_get - increment a dynamic percpu refcount
> > + *
> > + * Analagous to atomic_inc().
> > +  */
> > +static inline void percpu_ref_get(struct percpu_ref *ref)
> > +{
> > +	unsigned long pcpu_count;
> > +
> > +	preempt_disable();
> > +
> > +	pcpu_count = ACCESS_ONCE(ref->pcpu_count);
> > +
> > +	if (REF_STATUS(pcpu_count) == PCPU_REF_PTR) {
> > +		/* for rcu - we're not using rcu_dereference() */
> > +		smp_read_barrier_depends();
> 
> Let's explain more. :)

I'd like to come up with a way of using rcu_dereference() without sparse
choking on __percpu combined with __rcu, but ok.

> > +		__this_cpu_inc(*((unsigned __percpu *) pcpu_count));
> 
> What about overflow?  Note that we can have systemetic cases where ref
> is gotten on one cpu and put on another transferring counts in a
> specific direction.

Heh, this keeps coming up.

It works because modular arithmatic is still associative for addition
and subtraction. It doesn't matter which cpu the gets and puts happen
on, the counters will still sum to the same value - even if individual
counters overflow/underflow.

(It's the same as doing a bunch of increments and decrements to a single
integer, but in random order. 0 - 1 - 1 + 1 + 1 + 1 will always equal 1,
no matter where you stick parentheses).

That's also why I use an unsigned integer for the per cpu counters...
since they're going to wrap and signed integer overflow is undefined.
Not that I really expect gcc to figure out a way to screw me for that,
but I tend to be anal about such things.

> > +#define PCPU_COUNT_BITS		50
> > +#define PCPU_COUNT_MASK		((1LL << PCPU_COUNT_BITS) - 1)
> > +
> > +#define PCPU_COUNT_BIAS		(1ULL << 32)
> 
> I really don't get this.  Why use 1<<32 for bias which isn't too
> difficult to overflow especially if you have many cpu threads and some
> systemetic drift in percpu refs?  What's the point of not using the
> higher bits?  Is it to encode count usage statistics into the same
> counter?  If so, just add another field.  24bytes is fine.

The systematic drift doesn't affect overflow here - the bias is only
applied to the atomic counter, which can only be off by at most whatever
the per cpu counters happen to sum to (and remember it's modular
arithmatic, so that's INT_MIN/INT_MAX).

Since I'm using 32 bit ints for the per cpu counters, 1 << 32 works just
fine. If we want to expand the per cpu counters to longs or 64 bit ints,
then yeah I'd use a larger bias.

It's nitpicky but using a larger bias feels like overly defensive
programming to me (if you know how big the bias has to be, great, use
that - if you aren't sure, wtf are you doing :P)

> I'd really like to see just basic percpu refcnt with async and sync
> interfaces w/o dynamic allocation.  At this point, we aren't really
> sure if there are gonna be enough benefits or use cases from dynamic
> allocation, so I don't really see the point in the added complexity.
> Let's start with basic stuff and add on dynamic alloc if it actually
> is necessary.

Well, with the alloc rework the dynamic allocation is completely hidden
from the users - it's just get and put. Killing dynamic allocation would
just mean that percpu_ref_init() could fail, as far as the api is
concerned.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/