[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cf7439c4-f72c-a145-5a65-84ae15c5d96f@intel.com>
Date: Tue, 12 Sep 2023 15:10:53 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>,
Shuah Khan <shuah@...nel.org>,
Shuah Khan <skhan@...uxfoundation.org>,
<linux-kselftest@...r.kernel.org>,
Maciej Wieczór-Retman
<maciej.wieczor-retman@...el.com>
CC: <linux-kernel@...r.kernel.org>,
Shaopeng Tan <tan.shaopeng@...fujitsu.com>,
<stable@...r.kernel.org>
Subject: Re: [PATCH 5/5] selftests/resctrl: Reduce failures due to outliers in
MBA/MBM tests
Hi Ilpo,
On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:
> 5% difference upper bound for success is a bit on the low side for the
"a bit on the low side" is very vague.
> MBA and MBM tests. Some platforms produce outliers that are slightly
> above that, typically 6-7%.
>
> Relaxing the MBA/MBM success bound to 8% removes most of the failures
> due those frequent outliers.
This description needs more context on what issue is being solved here.
What does the % difference represent? How was new percentage determined?
Did you investigate why there are differences between platforms? From
what I understand these tests measure memory bandwidth using perf and
resctrl and then compare the difference. Are there interesting things
about the platforms on which the difference is higher than 5%? Could
those be systems with multiple sockets (and thus multiple PMUs that need
to be setup, reset, and read)? Can the reading of the counters be improved
instead of relaxing the success criteria? A quick comparison between
get_mem_bw_imc() and get_mem_bw_resctrl() makes me think that a difference
is not surprising ... note how the PMU counters are started and reset
(potentially on multiple sockets) at every iteration while the resctrl
counters keep rolling and new values are just subtracted from previous.
Reinette
Powered by blists - more mailing lists