[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aa643c9b-8ce5-4cb1-98f6-645224aafdf8@linuxfoundation.org>
Date: Thu, 24 Oct 2024 16:36:19 -0600
From: Shuah Khan <skhan@...uxfoundation.org>
To: Reinette Chatre <reinette.chatre@...el.com>, fenghua.yu@...el.com,
shuah@...nel.org, tony.luck@...el.com, peternewman@...gle.com,
babu.moger@....com, ilpo.jarvinen@...ux.intel.com
Cc: maciej.wieczor-retman@...el.com, linux-kselftest@...r.kernel.org,
linux-kernel@...r.kernel.org, Shuah Khan <skhan@...uxfoundation.org>
Subject: Re: [PATCH V4 00/15] selftests/resctrl: Support diverse platforms
with MBM and MBA tests
On 10/24/24 15:18, Reinette Chatre wrote:
> Changes since V3:
> - V3: https://lore.kernel.org/all/cover.1729218182.git.reinette.chatre@intel.com/
> - Rebased on HEAD 2a027d6bb660 of kselftest/next.
> - Fix empty string parsing issues pointed out by Ilpo.
> - Add Reviewed-by tags.
> - Please see individual patches for detailed changes.
>
> Changes since V2:
> - V2: https://lore.kernel.org/all/cover.1726164080.git.reinette.chatre@intel.com/
> - Add fix to protect against buffer overflow when parsing text from sysfs files.
> - Add cleanup patch to address use of magic constants as pointed out by
> Ilpo.
> - Add Reviewed-by tags where received, except for "selftests/resctrl: Use cache
> size to determine "fill_buf" buffer size" that changed too much since
> receiving the Reviewed-by tag.
> - Please see individual patches for detailed changes.
>
> Changes since V1:
> - V1: https://lore.kernel.org/cover.1724970211.git.reinette.chatre@intel.com/
> - V2 contains the same general solutions to stated problem as V1 but these
> are now preceded by more fixes (patches 1 to 5) and improved robustness
> (patches 6 to 9) to existing tests before the series gets back
> to solving the original problem with more confidence in patches 10 to 13.
> - The posibility of making "memflush = false" for CMT test was discussed
> during V1. Modifying this setting does not have a significant impact on the
> observed results that are already well within acceptable range and this
> version thus keeps original default. If performance was a goal it may
> be possible to do further experimentation where "memflush = false" could
> eliminate the need for the sleep(1) within the test wrapper, but
> improving the performance is not a goal of this work.
> - (New) Support what seems to be unintended ability for user space to provide
> parameters to "fill_buf" by making the parsing robust and only support
> changing parameters that are supported to be changed. Drop support for
> "write" operation since it has never been measured.
> - (New) Improve wraparound handling. (Ilpo)
> - (New) A couple of new fixes addressing issues discovered during development.
> - (Change from V1) To support fill_buf parameters provided by user space as
> well as test specific fill_buf parameters struct fill_buf_param is no longer
> just a member of struct resctrl_val_param, instead there could be at most
> two instances of struct fill_buf_param, the immutable parameters provided
> by user space and the parameters used by individual tests. (Ilpo)
> - Please see individual patches for detailed changes.
>
> V1 cover:
>
> The resctrl selftests for Memory Bandwidth Allocation (MBA) and Memory
> Bandwidth Monitoring (MBM) are failing on some (for example [1]) Emerald
> Rapids systems. The test failures result from the following two
> properties of these systems:
> 1) Emerald Rapids systems can have up to 320MB L3 cache. The resctrl
> MBA and MBM selftests measure memory traffic for which a hardcoded
> 250MB buffer has been sufficient so far. On platforms with L3 cache
> larger than the buffer, the buffer fits in the L3 cache and thus
> no/very little memory traffic is generated during the "memory
> bandwidth" tests.
> 2) Some platform features, for example RAS features or memory
> performance features that generate memory traffic may drive accesses
> that are counted differently by performance counters and MBM
> respectively, for instance generating "overhead" traffic which is not
> counted against any specific RMID. Until now these counting
> differences have always been "in the noise". On Emerald Rapids
> systems the maximum MBA throttling (10% memory bandwidth)
> throttles memory bandwidth to where memory accesses by these other
> platform features push the memory bandwidth difference between
> memory controller performance counters and resctrl (MBM) beyond the
> tests' hardcoded tolerance.
>
> Make the tests more robust against platform variations:
> 1) Let the buffer used by memory bandwidth tests be guided by the size
> of the L3 cache.
> 2) Larger buffers require longer initialization time before the buffer can
> be used to measurement. Rework the tests to ensure that buffer
> initialization is complete before measurements start.
> 3) Do not compare performance counters and MBM measurements at low
> bandwidth. The value of "low" is hardcoded to 750MiB based on
> measurements on Emerald Rapids, Sapphire Rapids, and Ice Lake
> systems. This limit is not applicable to AMD systems since it
> only applies to the MBA and MBM tests that are isolated to Intel.
>
> [1]
> https://ark.intel.com/content/www/us/en/ark/products/237261/intel-xeon-platinum-8592-processor-320m-cache-1-9-ghz.html
>
> Reinette Chatre (15):
> selftests/resctrl: Make functions only used in same file static
> selftests/resctrl: Print accurate buffer size as part of MBM results
> selftests/resctrl: Fix memory overflow due to unhandled wraparound
> selftests/resctrl: Protect against array overrun during iMC config
> parsing
> selftests/resctrl: Protect against array overflow when reading strings
> selftests/resctrl: Make wraparound handling obvious
> selftests/resctrl: Remove "once" parameter required to be false
> selftests/resctrl: Only support measured read operation
> selftests/resctrl: Remove unused measurement code
> selftests/resctrl: Make benchmark parameter passing robust
> selftests/resctrl: Ensure measurements skip initialization of default
> benchmark
> selftests/resctrl: Use cache size to determine "fill_buf" buffer size
> selftests/resctrl: Do not compare performance counters and resctrl at
> low bandwidth
> selftests/resctrl: Keep results from first test run
> selftests/resctrl: Replace magic constants used as array size
>
> tools/testing/selftests/resctrl/cmt_test.c | 37 +-
> tools/testing/selftests/resctrl/fill_buf.c | 45 +-
> tools/testing/selftests/resctrl/mba_test.c | 54 ++-
> tools/testing/selftests/resctrl/mbm_test.c | 37 +-
> tools/testing/selftests/resctrl/resctrl.h | 79 +++-
> .../testing/selftests/resctrl/resctrl_tests.c | 95 +++-
> tools/testing/selftests/resctrl/resctrl_val.c | 447 +++++-------------
> tools/testing/selftests/resctrl/resctrlfs.c | 19 +-
> 8 files changed, 354 insertions(+), 459 deletions(-)
>
>
> base-commit: 2a027d6bb66002c8e50e974676f932b33c5fce10
Is this patch series ready to be applied?
thanks,
-- Shuah
Powered by blists - more mailing lists