linux-kernel - RE: [PATCH 0/7] x86/resctrl: Add support for Sub-NUMA cluster (SNC) systems

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <SJ1PR11MB6083D2481D47104FE67354A3FCAC9@SJ1PR11MB6083.namprd11.prod.outlook.com>
Date:   Tue, 28 Feb 2023 18:04:04 +0000
From:   "Luck, Tony" <tony.luck@...el.com>
To:     James Morse <james.morse@....com>,
        "Yu, Fenghua" <fenghua.yu@...el.com>,
        "Chatre, Reinette" <reinette.chatre@...el.com>,
        Peter Newman <peternewman@...gle.com>,
        Jonathan Corbet <corbet@....net>,
        "x86@...nel.org" <x86@...nel.org>
CC:     Shaopeng Tan <tan.shaopeng@...itsu.com>,
        Jamie Iles <quic_jiles@...cinc.com>,
        Babu Moger <babu.moger@....com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
        "patches@...ts.linux.dev" <patches@...ts.linux.dev>
Subject: RE: [PATCH 0/7] x86/resctrl: Add support for Sub-NUMA cluster (SNC)
 systems

> > Intel server systems starting with Skylake support a mode that logically
> > partitions each socket. E.g. when partitioned two ways, half the cores,
> > L3 cache, and memory controllers are allocated to each of the partitions.
> > This may reduce average latency to access L3 cache and memory, with the
> > tradeoff that only half the L3 cache is available for subnode-local memory
> > access.
>
> I couldn't find a description of what happens to the CAT bitmaps or counters.

No changes to CAT. The cache is partitioned between sub-numa nodes based
on the index, not by dividing the ways. E.g. an 8-way associative 32MB cache is
still 8-way associative in each sub-node, but with 16MB available to each node.

This means users who want a specific amount of cache will need to allocate
more bits in the cache way mask (because each way is half as big).

> Presumably the CAT bitmaps are duplicated, so each cluster has its own set, and
> the counters aren't - so software has to co-ordinate the use of RMID across the CPUs?

Nope. Still one set of CAT bit maps per socket.

With "N" RMIDs available on a system with SNC disabled, there will be N/2 available
when there are 2 SNC nodes per socket. Processes use values [0 .. N/2).

> How come cacheinfo isn't modified to report the L3 partitions as separate caches?
> Otherwise user-space would assume the full size of the cache is available on any of those
> CPUs.

The size of the cache is perhaps poorly defined in the SNC enabled case. A well
behaved NUMA application that is only accessing memory from its local node will
see an effective cache half the size. But if a process accesses memory from the
other SNC node on the same socket, then it will get allocations in that SNC nodes
half share of the cache.  Accessing memory across inter-socket links will end up
allocating across the whole cache.

Moral: SNC mode is intended for applications that have very well-behaved NUMA
characteristics.

-Tony