[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <SJ1PR11MB6083D2481D47104FE67354A3FCAC9@SJ1PR11MB6083.namprd11.prod.outlook.com>
Date: Tue, 28 Feb 2023 18:04:04 +0000
From: "Luck, Tony" <tony.luck@...el.com>
To: James Morse <james.morse@....com>,
"Yu, Fenghua" <fenghua.yu@...el.com>,
"Chatre, Reinette" <reinette.chatre@...el.com>,
Peter Newman <peternewman@...gle.com>,
Jonathan Corbet <corbet@....net>,
"x86@...nel.org" <x86@...nel.org>
CC: Shaopeng Tan <tan.shaopeng@...itsu.com>,
Jamie Iles <quic_jiles@...cinc.com>,
Babu Moger <babu.moger@....com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
"patches@...ts.linux.dev" <patches@...ts.linux.dev>
Subject: RE: [PATCH 0/7] x86/resctrl: Add support for Sub-NUMA cluster (SNC)
systems
> > Intel server systems starting with Skylake support a mode that logically
> > partitions each socket. E.g. when partitioned two ways, half the cores,
> > L3 cache, and memory controllers are allocated to each of the partitions.
> > This may reduce average latency to access L3 cache and memory, with the
> > tradeoff that only half the L3 cache is available for subnode-local memory
> > access.
>
> I couldn't find a description of what happens to the CAT bitmaps or counters.
No changes to CAT. The cache is partitioned between sub-numa nodes based
on the index, not by dividing the ways. E.g. an 8-way associative 32MB cache is
still 8-way associative in each sub-node, but with 16MB available to each node.
This means users who want a specific amount of cache will need to allocate
more bits in the cache way mask (because each way is half as big).
> Presumably the CAT bitmaps are duplicated, so each cluster has its own set, and
> the counters aren't - so software has to co-ordinate the use of RMID across the CPUs?
Nope. Still one set of CAT bit maps per socket.
With "N" RMIDs available on a system with SNC disabled, there will be N/2 available
when there are 2 SNC nodes per socket. Processes use values [0 .. N/2).
> How come cacheinfo isn't modified to report the L3 partitions as separate caches?
> Otherwise user-space would assume the full size of the cache is available on any of those
> CPUs.
The size of the cache is perhaps poorly defined in the SNC enabled case. A well
behaved NUMA application that is only accessing memory from its local node will
see an effective cache half the size. But if a process accesses memory from the
other SNC node on the same socket, then it will get allocations in that SNC nodes
half share of the cache. Accessing memory across inter-socket links will end up
allocating across the whole cache.
Moral: SNC mode is intended for applications that have very well-behaved NUMA
characteristics.
-Tony
Powered by blists - more mailing lists