linux-kernel - Re: [RFC][PATCH 4/7] sched: Replace sd_busy/nr_busy_cpus with sched_domain

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1463098346.25753.15.camel@neuling.org>
Date:	Fri, 13 May 2016 10:12:26 +1000
From:	Michael Neuling <mikey@...ling.org>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Matt Fleming <matt@...eblueprint.co.uk>, mingo@...nel.org,
	linux-kernel@...r.kernel.org, clm@...com, mgalbraith@...e.de,
	tglx@...utronix.de, fweisbec@...il.com, srikar@...ux.vnet.ibm.com,
	anton@...ba.org, oliver <oohall@...il.com>,
	"Shreyas B. Prabhu" <shreyas@...ux.vnet.ibm.com>
Subject: Re: [RFC][PATCH 4/7] sched: Replace sd_busy/nr_busy_cpus with
 sched_domain_shared

On Thu, 2016-05-12 at 13:33 +0200, Peter Zijlstra wrote:
> On Thu, May 12, 2016 at 09:07:52PM +1000, Michael Neuling wrote:
> > 
> > On Thu, 2016-05-12 at 07:07 +0200, Peter Zijlstra wrote:
> > 
> > > 
> > > But as per the above, Power7 and Power8 have explicit logic to share
> > > the
> > > per-core L3 with the other cores.
> > > 
> > > How effective is that? From some of the slides/documents i've looked
> > > at
> > > the L3s are connected with a high-speed fabric. Suggesting that the
> > > cross-core sharing should be fairly efficient.
> > I'm not sure.  I thought it was mostly private but if another core was
> > sleeping or not experiencing much cache pressure, another core could
> > use it
> > for some things. But I'm fuzzy on the the exact properties, sorry.
> Right; I'm going by bits and pieces found on the tubes, so I'm just
> guessing ;-)
> 
> But it sounds like these L3s are nowhere close to what Intel does with
> their L3, where each core has an L3 slice, and slices are connected on a
> ring to form a unified/shared cache across all cores.
> 
> http://www.realworldtech.com/sandy-bridge/8/

The POWER8 user manual is what you want to look at:

https://www.setphaserstostun.org/power8/POWER8_UM_v1.3_16MAR2016_pub.pdf

There is a section 10. "L3 Cache Overview" starting on page 128.  In there
it talks about L3.0 which is using the local cores L3.  L3.1 which is using
some other cores L3.

Once the L3.0 is full, we can cast out to an L3.1 (ie. the cache on another
core).  L3.1 can also provide data for reads.

ECO mode (section 10.4) is what I was talking about for sleeping/unused
cores.  That's more of a boot time (firmware option) than something we can
dynamically play with at runtime (I believe), so it's not something I think
is relevant here.

> > 
> > > 
> > > In which case it would make sense to treat/model the combined L3 as a
> > > single large LLC covering all cores.
> > Are you thinking it would be much cheaper to migrate a task to another
> > core
> > inside this chip, than to off chip?
> Basically; and if so, if its cheap enough to shoot a task to an idle
> core to avoid queueing. Assuming there still is some cache residency on
> the old core, the inter-core fill should be much cheaper than fetching
> it off package (either remote cache or dram).

So I think that will apply on POWER8.

In 10.4.2 it says "The L3.1 ECO Caches will be snooped and provide
intervention data similar to the L2 and L3.0 caches on the
chip"  That should be much faster than going to another chip or DIMM.

So migrating to another core on the same chip should be faster than off
chip.

Mikey




> Or at least; so goes my reasoning based on my google results.
>