linux-kernel - Re: [RFC][PATCH 4/7] sched: Replace sd_busy/nr_busy_cpus with sched_domain

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 12 May 2016 13:33:59 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Michael Neuling <mikey@...ling.org>
Cc:	Matt Fleming <matt@...eblueprint.co.uk>, mingo@...nel.org,
	linux-kernel@...r.kernel.org, clm@...com, mgalbraith@...e.de,
	tglx@...utronix.de, fweisbec@...il.com, srikar@...ux.vnet.ibm.com,
	anton@...ba.org, oliver <oohall@...il.com>,
	"Shreyas B. Prabhu" <shreyas@...ux.vnet.ibm.com>
Subject: Re: [RFC][PATCH 4/7] sched: Replace sd_busy/nr_busy_cpus with
 sched_domain_shared

On Thu, May 12, 2016 at 09:07:52PM +1000, Michael Neuling wrote:
> On Thu, 2016-05-12 at 07:07 +0200, Peter Zijlstra wrote:

> > But as per the above, Power7 and Power8 have explicit logic to share the
> > per-core L3 with the other cores.
> > 
> > How effective is that? From some of the slides/documents i've looked at
> > the L3s are connected with a high-speed fabric. Suggesting that the
> > cross-core sharing should be fairly efficient.
> 
> I'm not sure.  I thought it was mostly private but if another core was
> sleeping or not experiencing much cache pressure, another core could use it
> for some things. But I'm fuzzy on the the exact properties, sorry.

Right; I'm going by bits and pieces found on the tubes, so I'm just
guessing ;-)

But it sounds like these L3s are nowhere close to what Intel does with
their L3, where each core has an L3 slice, and slices are connected on a
ring to form a unified/shared cache across all cores.

http://www.realworldtech.com/sandy-bridge/8/

> > In which case it would make sense to treat/model the combined L3 as a
> > single large LLC covering all cores.
> 
> Are you thinking it would be much cheaper to migrate a task to another core
> inside this chip, than to off chip?

Basically; and if so, if its cheap enough to shoot a task to an idle
core to avoid queueing. Assuming there still is some cache residency on
the old core, the inter-core fill should be much cheaper than fetching
it off package (either remote cache or dram).

Or at least; so goes my reasoning based on my google results.