[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <SJ1PR11MB60832422CBDCCDA580010769FC2D2@SJ1PR11MB6083.namprd11.prod.outlook.com>
Date: Mon, 18 Mar 2024 19:34:28 +0000
From: "Luck, Tony" <tony.luck@...el.com>
To: "Chatre, Reinette" <reinette.chatre@...el.com>, James Morse
<james.morse@....com>
CC: "Wieczor-Retman, Maciej" <maciej.wieczor-retman@...el.com>, "Yu, Fenghua"
<fenghua.yu@...el.com>, Shuah Khan <shuah@...nel.org>,
"ilpo.jarvinen@...ux.intel.com" <ilpo.jarvinen@...ux.intel.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>
Subject: RE: [PATCH 4/4] selftests/resctrl: Adjust SNC support messages
> >> While that is in some ways a more accurate view, it breaks a lot of
> >> legacy monitoring applications that expect the "L3" names.
> >
> > True - but the behaviour is different from a non SNC system, if this software can read the
> > file - but goes wrong because the contents of the file represent something different, its
> > still broken.
>
> This is a good point. There is also /sys/fs/resctrl/info/L3_MON to consider and trying to think
> what to do about that makes me go in circles about when user space may expect resctrl to indicate
> the resource and when user space may expect resctrl to indicate the scope. For example,
> /sys/fs/resctrl/mon_data/mon_L3_00 contains files with data that monitor the
> "L3" _resource_, no? If we change that to /sys/fs/resctrl/mon_data/mon_NODE_00 then it
> switches the meaning of the middle term to be "scope" while it still contains the monitoring
> data of the "L3" resource. So does that mean user space would need to rely on
> /sys/fs/resctrl/info/L3_MON to obtain the information about which monitoring files
> (/sys/fs/resctrl/info/L3_MON/mon_features) are related to the particular resource and then
> match those filenames with the filenames in /sys/fs/resctrl/mon_data/mon_NODE_00 to know
> which resource it applies to and learn from the directory name what scope measurement is at?
Reinette,
It's both a wave and a particle, depending on the observer.
In SNC systems resources on each socket are divided into 2, 3, 4 nodes. But the
division is complicated. Memory and CPU cores are easy. They are each assigned
to an SNC node. The cache is more complicated. The hash function for memory
address to cache index is the part that is SNC aware. So memory on SNC node1
will allocate in the cache indices assigned to SNC node1. But that function has to
be independent of which CPU is doing the access. That's why I keep mentioning
"well behaved NUMA applications when talking about SNC.
So the resctrl monitoring operations still work on the L3 cache, but in SNC mode
they work on a portion of the L3 cache. As long as all accesses are NUMA local you
can think of the cache as partitioned between the SNC nodes.
But not everything is well behaved from a NUMA perspective. It would be misleading
to describe the occupancy and bandwidth as belonging to an SNC node.
It's also a bit misleading to describe in terms of an L3 cache instance. But doing
so doesn't require application changes.
-Tony
Powered by blists - more mailing lists