[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200729061355.GA14603@linux.vnet.ibm.com>
Date: Wed, 29 Jul 2020 11:43:55 +0530
From: Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
To: Valentin Schneider <valentin.schneider@....com>
Cc: Michael Ellerman <mpe@...erman.id.au>,
linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>,
LKML <linux-kernel@...r.kernel.org>,
Nicholas Piggin <npiggin@...il.com>,
Anton Blanchard <anton@...abs.org>,
"Oliver O'Halloran" <oohall@...il.com>,
Nathan Lynch <nathanl@...ux.ibm.com>,
Michael Neuling <mikey@...ling.org>,
Gautham R Shenoy <ego@...ux.vnet.ibm.com>,
Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Jordan Niethe <jniethe5@...il.com>
Subject: Re: [PATCH v4 09/10] Powerpc/smp: Create coregroup domain
* Valentin Schneider <valentin.schneider@....com> [2020-07-28 16:03:11]:
Hi Valentin,
Thanks for looking into the patches.
> On 27/07/20 06:32, Srikar Dronamraju wrote:
> > Add percpu coregroup maps and masks to create coregroup domain.
> > If a coregroup doesn't exist, the coregroup domain will be degenerated
> > in favour of SMT/CACHE domain.
> >
>
> So there's at least one arm64 platform out there with the same "pairs of
> cores share L2" thing (Ampere eMAG), and that lives quite happily with the
> default scheduler topology (SMT/MC/DIE). Each pair of core gets its MC
> domain, and the whole system is covered by DIE.
>
> Now arguably it's not a perfect representation; DIE doesn't have
> SD_SHARE_PKG_RESOURCES so the highest level sd_llc can point to is MC. That
> will impact all callsites using cpus_share_cache(): in the eMAG case, only
> pairs of cores will be seen as sharing cache, even though *all* cores share
> the same L3.
>
Okay, Its good to know that we have a chip which is similar to P9 in
topology.
> I'm trying to paint a picture of what the P9 topology looks like (the one
> you showcase in your cover letter) to see if there are any similarities;
> from what I gather in [1], wikichips and your cover letter, with P9 you can
> have something like this in a single DIE (somewhat unsure about L3 setup;
> it looks to be distributed?)
>
> +---------------------------------------------------------------------+
> | L3 |
> +---------------+-+---------------+-+---------------+-+---------------+
> | L2 | | L2 | | L2 | | L2 |
> +------+-+------+ +------+-+------+ +------+-+------+ +------+-+------+
> | L1 | | L1 | | L1 | | L1 | | L1 | | L1 | | L1 | | L1 |
> +------+ +------+ +------+ +------+ +------+ +------+ +------+ +------+
> |4 CPUs| |4 CPUs| |4 CPUs| |4 CPUs| |4 CPUs| |4 CPUs| |4 CPUs| |4 CPUs|
> +------+ +------+ +------+ +------+ +------+ +------+ +------+ +------+
>
> Which would lead to (ignoring the whole SMT CPU numbering shenanigans)
>
> NUMA [ ...
> DIE [ ]
> MC [ ] [ ] [ ] [ ]
> BIGCORE [ ] [ ] [ ] [ ]
> SMT [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]
> 00-03 04-07 08-11 12-15 16-19 20-23 24-27 28-31 <other node here>
>
What you have summed up is perfectly what a P9 topology looks like. I dont
think I could have explained it better than this.
> This however has MC == BIGCORE; what makes it you can have different spans
> for these two domains? If it's not too much to ask, I'd love to have a P9
> topology diagram.
>
> [1]: 20200722081822.GG9290@...ux.vnet.ibm.com
At this time the current topology would be good enough i.e BIGCORE would
always be equal to a MC. However in future we could have chips that can have
lesser/larger number of CPUs in llc than in a BIGCORE or we could have
granular or split L3 caches within a DIE. In such a case BIGCORE != MC.
Also in the current P9 itself, two neighbouring core-pairs form a quad.
Cache latency within a quad is better than a latency to a distant core-pair.
Cache latency within a core pair is way better than latency within a quad.
So if we have only 4 threads running on a DIE all of them accessing the same
cache-lines, then we could probably benefit if all the tasks were to run
within the quad aka MC/Coregroup.
I have found some benchmarks which are latency sensitive to benefit by
having a grouping a quad level (using kernel hacks and not backed by
firmware changes). Gautham also found similar results in his experiments
but he only used binding within the stock kernel.
I am not setting SD_SHARE_PKG_RESOURCES in MC/Coregroup sd_flags as in MC
domain need not be LLC domain for Power.
--
Thanks and Regards
Srikar Dronamraju
Powered by blists - more mailing lists