[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <cover.1724970211.git.reinette.chatre@intel.com>
Date: Thu, 29 Aug 2024 15:52:26 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: fenghua.yu@...el.com,
shuah@...nel.org,
tony.luck@...el.com,
peternewman@...gle.com,
babu.moger@....com,
ilpo.jarvinen@...ux.intel.com
Cc: maciej.wieczor-retman@...el.com,
reinette.chatre@...el.com,
linux-kselftest@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: [PATCH 0/6] selftests/resctrl: Support diverse platforms with MBM and MBA tests
The resctrl selftests for Memory Bandwidth Allocation (MBA) and Memory
Bandwidth Monitoring (MBM) are failing on some (for example [1]) Emerald
Rapids systems. The test failures result from the following two
properties of these systems:
1) Emerald Rapids systems can have up to 320MB L3 cache. The resctrl
MBA and MBM selftests measure memory traffic for which a hardcoded
250MB buffer has been sufficient so far. On platforms with L3 cache
larger than the buffer, the buffer fits in the L3 cache and thus
no/very little memory traffic is generated during the "memory
bandwidth" tests.
2) Some platform features, for example RAS features or memory
performance features that generate memory traffic may drive accesses
that are counted differently by performance counters and MBM
respectively, for instance generating "overhead" traffic which is not
counted against any specific RMID. Until now these counting
differences have always been "in the noise". On Emerald Rapids
systems the maximum MBA throttling (10% memory bandwidth)
throttles memory bandwidth to where memory accesses by these other
platform features push the memory bandwidth difference between
memory controller performance counters and resctrl (MBM) beyond the
tests' hardcoded tolerance.
Make the tests more robust against platform variations:
1) Let the buffer used by memory bandwidth tests be guided by the size
of the L3 cache.
2) Larger buffers require longer initialization time before the buffer can
be used to measurement. Rework the tests to ensure that buffer
initialization is complete before measurements start.
3) Do not compare performance counters and MBM measurements at low
bandwidth. The value of "low" is hardcoded to 750MiB based on
measurements on Emerald Rapids, Sapphire Rapids, and Ice Lake
systems. This limit is not applicable to AMD systems since it
only applies to the MBA and MBM tests that are isolated to Intel.
[1]
https://ark.intel.com/content/www/us/en/ark/products/237261/intel-xeon-platinum-8592-processor-320m-cache-1-9-ghz.html
Reinette Chatre (6):
selftests/resctrl: Fix sparse warnings
selftests/resctrl: Ensure measurements skip initialization of default
benchmark
selftests/resctrl: Simplify benchmark parameter passing
selftests/resctrl: Use cache size to determine "fill_buf" buffer size
selftests/resctrl: Do not compare performance counters and resctrl at
low bandwidth
selftests/resctrl: Keep results from first test run
tools/testing/selftests/resctrl/cmt_test.c | 33 +--
tools/testing/selftests/resctrl/fill_buf.c | 19 +-
tools/testing/selftests/resctrl/mba_test.c | 26 +-
tools/testing/selftests/resctrl/mbm_test.c | 25 +-
tools/testing/selftests/resctrl/resctrl.h | 57 +++--
.../testing/selftests/resctrl/resctrl_tests.c | 15 +-
tools/testing/selftests/resctrl/resctrl_val.c | 223 +++++-------------
7 files changed, 152 insertions(+), 246 deletions(-)
--
2.46.0
Powered by blists - more mailing lists