linux-kernel - Re: [patch V3] percpu_counter: scalability works

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 17 May 2011 11:01:01 +0200
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Shaohua Li <shaohua.li@...el.com>
Cc:	Tejun Heo <tj@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"cl@...ux.com" <cl@...ux.com>,
	"npiggin@...nel.dk" <npiggin@...nel.dk>
Subject: Re: [patch V3] percpu_counter: scalability works

Le mardi 17 mai 2011 à 13:22 +0800, Shaohua Li a écrit :

> I don't know why you said there is no good reason. I posted a lot of
> data which shows improvement, while you just ignore.
> 

Dear Shaihua, ignoring you would mean I would not even answer, and let
other people do, when they have time (maybe in 2 or 3 months, maybe
never. Just take a look at my previous attempts, two years ago,
atomic64_t didnt exist at that time, obviously)

I hope you can see the value I add to your concerns, making this subject
alive and even coding stuff. We all share ideas, we are not fighting.

> The size issue is completely pointless. If you have 4096 CPUs, how could
> you worry about 16k bytes memory. Especially the extra memory makes the
> API much faster.
> 

It is not pointless at all, maybe for Intel guys it is.

I just NACK this idea

> > 2) Two separate alloc_percpu() -> two separate cache lines instead of
> > one.
> Might be in one cache line actually, but can be easily fixed if not
> anyway. On the other hand, even touch two cache lines, it's still faster
> than the original spinlock implementation, which I already posted data.
> 
> > But then, if one alloc_percpu() -> 32 kbytes per object.
> the size issue is completely pointless
> 

Thats your opinion

> > 3) Focus on percpu_counter() implementation instead of making an
> > analysis of callers.
> > 
> > I did a lot of rwlocks removal in network stack because they are not the
> > right synchronization primitive in many cases. I did not optimize
> > rwlocks. If rwlocks were even slower, I suspect other people would have
> > help me to convert things faster.
> My original issue is mmap, but I already declaimed several times we can
> make percpu_counter better, why won't we?
> 

Only if it's a good compromise. Your last patches are not yet good
candidates I'm afraid.

> > 4) There is a possible way to solve your deviation case : add at _add()
> > beginning a short cut for crazy 'amount' values. Its a bit expensive on
> > 32bit arches, so might be added in a new helper to let _add() be fast
> > for normal and gentle users.
> 
> +		if (unlikely(cmpxchg(ptr, old, 0) != old))
> > +			goto retry;
> this doesn't change anything, you still have the deviation issue here
> 

You do understand 'my last patch' doesnt address the deviation problem
anymore ? Its a completely different matter to address vm_committed_as
problem (and maybe other percpu_counters).

The thing you prefer to not touch so that your 'results' sound better...

If your percpu_counter is hit so hardly that you have many cpus
competing in atomic64(&count, &fbc->count), _sum() result is wrong right
after its return. so _sum() _can_ deviate even if it claims being more
precise.

> > +		atomic64_add(count, &fbc->count);
> 
> > if (unlikely(amount >= batch || amount <= -batch)) {
> > 	atomic64(amount, &fbc->count);
> > 	return;
> > }
> why we just handle this special case, my patch can make the whole part
> faster without deviation
> 

This 'special case' is the whole problem others pointed out, and this
makes deviation worst value like before your initial patch.

> so you didn't point out any obvious problem with my patch actually. This
> is good.
> 

This brings nothing. Just say NO to people saying its needed.

Its not because Tejun says there is a deviation "problem", you need to
change lglock and bring lglock to percpu_counter, or double
percpu_counter size, or whatever crazy idea.

Just convince him that percpu_counter by itself cannot bring a max
deviation guarantee. No percpu_counter user cares at all. If they do,
then percpu_counter choice for their implementation is probably wrong.

[ We dont provide yet a percpu_counter_add_return() function ]

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/