lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Thu, 18 Feb 2016 15:12:07 -0800 (PST)
From:	Vikas Shivappa <vikas.shivappa@...el.com>
To:	Thomas Gleixner <tglx@...utronix.de>
cc:	Vikas Shivappa <vikas.shivappa@...el.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>, x86@...nel.org,
	Stephane Eranian <eranian@...gle.com>,
	Matt Fleming <matt@...eblueprint.co.uk>
Subject: Re: [PATCH] x86/perf/intel/cqm: Get rid of the silly for_each_cpu
 lookups



On Thu, 18 Feb 2016, Thomas Gleixner wrote:

> On Wed, 17 Feb 2016, Thomas Gleixner wrote:
>> On Wed, 17 Feb 2016, Vikas Shivappa wrote:
>>
>> Please stop top posting, finally!
>>
>>> But we have an extra static - static to avoid having it in the stack..
>>
>> It's not about the cpu mask on the stack. The reason was that with cpumask off
>> stack cpumask_and_mask() requires an allocation, which then can't be used in
>> the starting/dying callbacks.
>>
>> Darn, you are right to remind me.
>>
>> Now, the proper solution for this stuff is to provide a library function as we
>> need that for several drivers. No point to duplicate that functionality. I'll
>> cook something up and repost the uncore/cqm set tomorrow.
>
> Second thoughts on that.
>
> cpumask_any_but() is fine as is, if we feed it topology_core_cpumask(cpu). The
> worst case search is two bitmap_find_next() if the first search returned cpu.
>
> Now cpumask_any_and() does a search as well, but the number of
> bitmap_find_next() invocations is limited to the number of sockets if we feed
> the cqm_cpu_mask as first argument. So for 4 or 8 sockets that's still a
> reasonable limit. If the people with insane large machines care, we can
> revisit that topic. It's still faster than for_each_online_cpu() :)

Agree. if we dont care about the large number of sockets this would still be 
far better than scanning each cpu. There could be some branches we 
avoid if we are too aggressive and remove 'all' loops (the 2nd search is always 
a success if 1st one fails in cpumask_any_but)
by using the cpumask_and but they should not be much important/use 
in this case.

Will send rapl patch separately.

Thanks,
Vikas

>
> Thanks,
>
> 	tglx
>

Powered by blists - more mailing lists