linux-kernel - Re: [PATCH 7/7] x86/resctrl: Determine if Sub-NUMA Cluster is enabled and initialize.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <ZBDXzz+f1nSP1Ml0@agluck-desk3.sc.intel.com>
Date:   Tue, 14 Mar 2023 13:23:43 -0700
From:   Tony Luck <tony.luck@...el.com>
To:     "Moger, Babu" <babu.moger@....com>
Cc:     Fenghua Yu <fenghua.yu@...el.com>,
        Reinette Chatre <reinette.chatre@...el.com>,
        Peter Newman <peternewman@...gle.com>,
        Jonathan Corbet <corbet@....net>, x86@...nel.org,
        Shaopeng Tan <tan.shaopeng@...itsu.com>,
        James Morse <james.morse@....com>,
        Jamie Iles <quic_jiles@...cinc.com>,
        linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
        patches@...ts.linux.dev
Subject: Re: [PATCH 7/7] x86/resctrl: Determine if Sub-NUMA Cluster is
 enabled and initialize.

On Tue, Feb 28, 2023 at 01:51:32PM -0600, Moger, Babu wrote:
> I am thinking loud here.
> When a new monitor group is created, new RMID is assigned. This is done by
> alloc_rmid. It does not know about the rmid_offset details. This will
> allocate the one of the free RMIDs.
> 
> When CPUs are assigned to the group, then per cpu  pqr_state is updated.
> At that point, this RMID becomes default_rmid for that cpu.
> 
> But CPUs can be assigned from two different Sub-NUMA nodes.
> 
> Considering same example you mentioned.
> 
> E.g. in 2-way Sub-NUMA cluster with 200 RMID counters there are only
> 100 available counters to the resctrl code. When running on the first
> SNC node RMID values 0..99 are used as before. But when running on the
> second node, a task that is assigned resctrl rmid=10 must load 10+100
> into IA32_PQR_ASSOC to use RMID counter 110.
> 
> #mount -t resctrl resctrl /sys/fs/resctrl/
> #cd /sys/fs/resctrl/
> #mkdir test  (Lets say RMID 1 is allocated)
> #cd test
> #echo 1 > cpus_list
> #echo 101 > cpus_list
> 
> In this case, the following code may run on two different RMIDs even
> though it was intended to run on same RMID.
> 
> wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid + this_cpu_read(rmid_offset));
> 
> Have you thought of this problem?

Now I've thought about this. I don't think it is a problem.

With SNC enabled for two nodes per socket the available RMIDs
are divided between the SNC nodes, but are for some purposes
numbered [0 .. N/2) but in some cases must be viewed as two
separate sets [0 .. N/2) on the first node and [N/2 .. N) on
the second.

In your example RMID 1 is assigned to the group and you have
one CPU from each node in the group. Processes on CPU1 will
load IA32_PQR_ASSOC.RMID = 1, while processes on CPU101 will
set IA32_PQR_ASSOC.RMID = 101. So counts of memory bandwidth
and cache occupancy will be in two different physical RMID
counters.

To read these back the user needs to lookup which $node each CPU
belongs to and then read from the appropriate
mon_data/mon_L3_$node/{llc_occupancy,mbm_local_bytes,mbm_total_bytes}
file.

$ cat mon_data/mon_L3_00/llc_occupancy # reads RMID=1
$ cat mon_data/mon_L3_01/llc_occupancy # reads RMID=101

-Tony