linux-kernel - Re: [patch V3] percpu_counter: scalability works

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20110517124528.GN20624@htj.dyndns.org>
Date:	Tue, 17 May 2011 14:45:28 +0200
From:	Tejun Heo <tj@...nel.org>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	Shaohua Li <shaohua.li@...el.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"cl@...ux.com" <cl@...ux.com>,
	"npiggin@...nel.dk" <npiggin@...nel.dk>
Subject: Re: [patch V3] percpu_counter: scalability works

Hello, Eric.

On Tue, May 17, 2011 at 02:20:07PM +0200, Eric Dumazet wrote:
> Spikes are expected and have no effect by design.
> 
> batch value is chosen so that granularity of the percpu_counter
> (batch*num_online_cpus()) is the spike factor, and thats pretty
> difficult when number of cpus is high.
> 
> In Shaohua workload, 'amount' for a 128Mbyte mapping is 32768, while the
> batch value is 48. 48*24 = 1152.
> So the percpu s32 being in [-47 .. 47] range would not change the
> accuracy of the _sum() function [ if it was eventually called, but its
> not ]
> 
> No drift in the counter is the only thing we care - and _read() being
> not too far away from the _sum() value, in particular if the
> percpu_counter is used to check a limit that happens to be low (against
> granularity of the percpu_counter : batch*num_online_cpus()).
> 
> I claim extra care is not needed. This might give the false impression
> to reader/user that percpu_counter object can replace a plain
> atomic64_t.

We already had this discussion.  Sure, we can argue about it again all
day but I just don't think it's a necessary compromise and really
makes _sum() quite dubious.  It's not about strict correctness, it
can't be, but if I spent the overhead to walk all the different percpu
counters, I'd like to have a rather exact number if there's nothing
much going on (freeblock count, for example).  Also, I want to be able
to use large @batch if the situation allows for it without worrying
about _sum() accuracy.

Given that _sum() is super-slow path and we have a lot of latitude
there, this should be possible without resorting to heavy handed
approach like lglock.  I was hoping that someone would come up with a
better solution, which didn't seem to have happened.  Maybe I was
wrong, I don't know.  I'll give it a shot.

But, anyways, here's my position regarding the issue.

* If we're gonna just fix up the slow path, I don't want to make
  _sum() less useful by making its accuracy dependent upon @batch.

* If somebody is interested, it would be worthwhile to see whether we
  can integrate vmstat and percpu counters so that its deviation is
  automatically regulated and we don't have to think about all this
  anymore.

I'll see if I can come up with something.

Thank you.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/