lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 14 Nov 2011 15:57:13 +0200
From:	Gilad Ben-Yossef <gilad@...yossef.com>
To:	Hillf Danton <dhillf@...il.com>
Cc:	linux-kernel@...r.kernel.org,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Russell King <linux@....linux.org.uk>, linux-mm@...ck.org,
	Christoph Lameter <cl@...ux-foundation.org>,
	Pekka Enberg <penberg@...nel.org>
Subject: Re: [PATCH v3 4/5] slub: Only IPI CPUs that have per cpu obj to flush

On Mon, Nov 14, 2011 at 3:19 PM, Hillf Danton <dhillf@...il.com> wrote:
> On Sun, Nov 13, 2011 at 10:57 PM, Gilad Ben-Yossef <gilad@...yossef.com> wrote:
...
>>> Perhaps, the technique of local_cpu_mask defined in kernel/sched_rt.c
>>> could be used to replace the above atomic allocation.
>>>
>>
>> Thank you for taking the time to review my patch :-)
>>
>> That is indeed the direction I went with inthe previous iteration of
>> this patch, with the small change that because of observing that the
>> allocation will only actually occurs for CPUMASK_OFFSTACK=y which by
>> definition are systems with lots and lots of CPUs and, it is actually
>> better to allocate the cpumask per kmem_cache rather then per CPU,
>> since on system where it matters we are bound to have more CPUs (e.g.
>> 4096) then kmem_caches (~160). See
>> https://lkml.org/lkml/2011/10/23/151.
>>
>> I then went a head and further optimized the code to only incur the
>> memory overhead of allocating those cpumasks for CPUMASK_OFFSTACK=y
>> systems. See https://lkml.org/lkml/2011/10/23/152.
>>
>> As you can see from the discussion that evolved, there seems to be an
>> agreement that the code complexity overhead involved is simply not
>> worth it for what is, unlike sched_rt, a rather esoteric case and one
>> where allocation failure is easily dealt with.
>>
> Even with the introduced overhead of allocation, IPIs could not go down
> as much as we wish, right?
>

My apologies, but I don't think I follow you through -

If processor A needs processor B to do something, an IPI is the right
thing to do. Let's call them useful IPIs.

What I am trying to tackle is the places where processor B doesn't
really have anything to
do and processor A is simply blindly sending IPIs to the whole system.
I call them useless IPIs.

I don't see a reason why *useless* IPIs can go to zero, or very close
to that. Useful IPIs are fine :-)

Thanks,
Gilad
-- 
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@...yossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"Unfortunately, cache misses are an equal opportunity pain provider."
-- Mike Galbraith, LKML
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ