lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a2c7baf1-1c69-45a5-8755-38d152d33fae@intel.com>
Date: Fri, 6 Sep 2024 17:05:29 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
CC: <fenghua.yu@...el.com>, <shuah@...nel.org>, <tony.luck@...el.com>,
	<peternewman@...gle.com>, <babu.moger@....com>,
	Maciej Wieczór-Retman <maciej.wieczor-retman@...el.com>,
	<linux-kselftest@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 5/6] selftests/resctrl: Do not compare performance
 counters and resctrl at low bandwidth

Hi Ilpo,

On 9/6/24 1:44 AM, Ilpo Järvinen wrote:
> On Thu, 5 Sep 2024, Reinette Chatre wrote:
>> On 9/5/24 4:45 AM, Ilpo Järvinen wrote:
>>> On Wed, 4 Sep 2024, Reinette Chatre wrote:
>>>> On 9/4/24 4:43 AM, Ilpo Järvinen wrote:
>>>>> On Fri, 30 Aug 2024, Reinette Chatre wrote:
>>>>>> On 8/30/24 4:42 AM, Ilpo Järvinen wrote:
>>>>>>> On Thu, 29 Aug 2024, Reinette Chatre wrote:
>>>>>>>
>>>>>>>> The MBA test incrementally throttles memory bandwidth, each time
>>>>>>>> followed by a comparison between the memory bandwidth observed
>>>>>>>> by the performance counters and resctrl respectively.
>>>>>>>>
>>>>>>>> While a comparison between performance counters and resctrl is
>>>>>>>> generally appropriate, they do not have an identical view of
>>>>>>>> memory bandwidth. For example RAS features or memory performance
>>>>>>>> features that generate memory traffic may drive accesses that are
>>>>>>>> counted differently by performance counters and MBM respectively,
>>>>>>>> for instance generating "overhead" traffic which is not counted
>>>>>>>> against any specific RMID. As a ratio, this different view of
>>>>>>>> memory
>>>>>>>> bandwidth becomes more apparent at low memory bandwidths.
>>>>>>>
>>>>>>> Interesting.
>>>>>>>
>>>>>>> I did some time back prototype with a change to MBM test such that
>>>>>>> instead
>>>>>>> of using once=false I changed fill_buf to be able to run N passes
>>>>>>> through
>>>>>>> the buffer which allowed me to know how many reads were performed by
>>>>>>> the
>>>>>>> benchmark. This yielded numerical difference between all those 3
>>>>>>> values
>>>>>>> (# of reads, MBM, perf) which also varied from arch to another so it
>>>>>>> didn't end up making an usable test.
>>>>>>>
>>>>>>> I guess I now have an explanation for at least a part of the
>>>>>>> differences.
>>>>>>>
>>>>>>>> It is not practical to enable/disable the various features that
>>>>>>>> may generate memory bandwidth to give performance counters and
>>>>>>>> resctrl an identical view. Instead, do not compare performance
>>>>>>>> counters and resctrl view of memory bandwidth when the memory
>>>>>>>> bandwidth is low.
>>>>>>>>
>>>>>>>> Bandwidth throttling behaves differently across platforms
>>>>>>>> so it is not appropriate to drop measurement data simply based
>>>>>>>> on the throttling level. Instead, use a threshold of 750MiB
>>>>>>>> that has been observed to support adequate comparison between
>>>>>>>> performance counters and resctrl.
>>>>>>>>
>>>>>>>> Signed-off-by: Reinette Chatre <reinette.chatre@...el.com>
>>>>>>>> ---
>>>>>>>>      tools/testing/selftests/resctrl/mba_test.c | 7 +++++++
>>>>>>>>      tools/testing/selftests/resctrl/resctrl.h  | 6 ++++++
>>>>>>>>      2 files changed, 13 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/tools/testing/selftests/resctrl/mba_test.c
>>>>>>>> b/tools/testing/selftests/resctrl/mba_test.c
>>>>>>>> index cad473b81a64..204b9ac4b108 100644
>>>>>>>> --- a/tools/testing/selftests/resctrl/mba_test.c
>>>>>>>> +++ b/tools/testing/selftests/resctrl/mba_test.c
>>>>>>>> @@ -96,6 +96,13 @@ static bool show_mba_info(unsigned long
>>>>>>>> *bw_imc,
>>>>>>>> unsigned long *bw_resc)
>>>>>>>>        		avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS - 1);
>>>>>>>>      		avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1);
>>>>>>>> +		if (avg_bw_imc < THROTTLE_THRESHOLD || avg_bw_resc <
>>>>>>>> THROTTLE_THRESHOLD) {
>>>>>>>> +			ksft_print_msg("Bandwidth below threshold (%d
>>>>>>>> MiB).
>>>>>>>> Dropping results from MBA schemata %u.\n",
>>>>>>>> +					THROTTLE_THRESHOLD,
>>>>>>>> +					ALLOCATION_MAX -
>>>>>>>> ALLOCATION_STEP *
>>>>>>>> allocation);
>>>>>>>
>>>>>>> The second one too should be %d.
>>>>>>>
>>>>>>
>>>>>> hmmm ... I intended to have it be consistent with the ksft_print_msg()
>>>>>> that
>>>>>> follows. Perhaps allocation can be made unsigned instead?
>>>>>
>>>>> If you go that way, then it would be good to make the related defines
>>>>> and
>>>>> allocation in mba_setup() unsigned too, although the latter is a bit
>>>>> scary
>>>>
>>>> Sure, will look into that.
>>>>
>>>>> because it does allocation -= ALLOCATION_STEP which could underflow if
>>>>> the
>>>>> defines are ever changed.
>>>>>
>>>>
>>>> Is this not already covered in the following check:
>>>> 	if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX)
>>>> 		return END_OF_TESTS;
>>>>
>>>> We are talking about hardcoded constants though.
>>>
>>> Borderline yes. It is "covered" by the allocation > ALLOCATION_MAX but
>>> it's also very non-intuitive to let the value underflow and then check for
>>> that with the > operator.
>>
>> My understanding is that this is the traditional way of checking overflow
>> (or more accurately wraparound). There are several examples of this pattern
>> in the kernel and it is also the pattern that I understand Linus referred
>> to as "traditional" in [1]. Even the compiler's intrinsic overflow checkers
>> do checking in this way (perform the subtraction and then check if it
>> overflowed) [2].
> 
> Fair enough. I've never come across that kind of claim before.
> 
>>> You're correct that they're constants so one would need to tweak the
>>> source to end up into this condition in the first place.
>>>
>>> Perhaps I'm being overly pendantic here but I in general don't like
>>> leaving trappy and non-obvious logic like that lying around because one
>>> day one of such will come back biting.
>>
>> It is not clear to me what is "trappy" about this. The current checks will
>> catch the wraparound if somebody changes ALLOCATION_STEP in your scenario, no?
>>
>>> So, if a variable is unsigned and we ever do subtraction (or adding
>>> negative numbers to it), I'd prefer additional check to prevent ever
>>> underflowing it unexpectedly. Or leave them signed.
>>
>> To support checking at the time of subtraction we either need to change the
>> flow of that function since it cannot exit with failure if that subtraction
>> fails because of overflow/wraparound, or we need to introduce more state that
>> will be an additional check that the existing
>> "if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX)"
>> would have caught anyway.
>>
>> In either case, to do this checking at the time of subtraction the ideal way
>> would be to use check_sub_overflow() ... but it again does exactly what
>> you find to be non-intuitive since the implementation in
>> tools/include/linux/overflow.h uses the gcc intrinsics that does subtraction
>> first and then checks if overflow occurred.
> 
> It's trappy because by glance, that check looks unnecessary since
> allocation starts from max and goes downwards. Also worth to note,
> the check is not immediately after the decrement but done on the next
> iteration.

Right. This is probably what causes most confusion.

Considering that, what do you think of below that avoids wraparound entirely:

---8<---
From: Reinette Chatre <reinette.chatre@...el.com>
Subject: [PATCH] selftests/resctrl: Make wraparound handling obvious

Within mba_setup() the programmed bandwidth delay value starts
at the maximum (100, or rather ALLOCATION_MAX) and progresses
towards ALLOCATION_MIN by decrementing with ALLOCATION_STEP.

The programmed bandwidth delay should never be negative, so
representing it with an unsigned int is most appropriate. This
may introduce confusion because of the "allocation > ALLOCATION_MAX"
check used to check wraparound of the subtraction.

Modify the mba_setup() flow to start at the minimum, ALLOCATION_MIN,
and incrementally, with ALLOCATION_STEP steps, adjust the
bandwidth delay value. This avoids wraparound while making the purpose
of "allocation > ALLOCATION_MAX" clear and eliminates the
need for the "allocation < ALLOCATION_MIN" check.

Reported-by: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@...el.com>
---
Changes since V1:
- New patch
---
  tools/testing/selftests/resctrl/mba_test.c | 12 +++++++-----
  1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c
index ab8496a4925b..947d5699f0c8 100644
--- a/tools/testing/selftests/resctrl/mba_test.c
+++ b/tools/testing/selftests/resctrl/mba_test.c
@@ -39,7 +39,8 @@ static int mba_setup(const struct resctrl_test *test,
  		     const struct user_params *uparams,
  		     struct resctrl_val_param *p)
  {
-	static int runs_per_allocation, allocation = 100;
+	static unsigned int allocation = ALLOCATION_MIN;
+	static int runs_per_allocation = 0;
  	char allocation_str[64];
  	int ret;
  
@@ -50,7 +51,7 @@ static int mba_setup(const struct resctrl_test *test,
  	if (runs_per_allocation++ != 0)
  		return 0;
  
-	if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX)
+	if (allocation > ALLOCATION_MAX)
  		return END_OF_TESTS;
  
  	sprintf(allocation_str, "%d", allocation);
@@ -59,7 +60,7 @@ static int mba_setup(const struct resctrl_test *test,
  	if (ret < 0)
  		return ret;
  
-	allocation -= ALLOCATION_STEP;
+	allocation += ALLOCATION_STEP;
  
  	return 0;
  }
@@ -72,8 +73,9 @@ static int mba_measure(const struct user_params *uparams,
  
  static bool show_mba_info(unsigned long *bw_imc, unsigned long *bw_resc)
  {
-	int allocation, runs;
+	unsigned int allocation;
  	bool ret = false;
+	int runs;
  
  	ksft_print_msg("Results are displayed in (MB)\n");
  	/* Memory bandwidth from 100% down to 10% */
@@ -103,7 +105,7 @@ static bool show_mba_info(unsigned long *bw_imc, unsigned long *bw_resc)
  			       avg_diff_per > MAX_DIFF_PERCENT ?
  			       "Fail:" : "Pass:",
  			       MAX_DIFF_PERCENT,
-			       ALLOCATION_MAX - ALLOCATION_STEP * allocation);
+			       ALLOCATION_MIN + ALLOCATION_STEP * allocation);
  
  		ksft_print_msg("avg_diff_per: %d%%\n", avg_diff_per);
  		ksft_print_msg("avg_bw_imc: %lu\n", avg_bw_imc);


> 
> The risk here is that somebody removes allocation > ALLOCATION_MAX check.
> 
> Something called check_sub_overflow() is not subject to a similar risk
> regardless of what operations occur inside it.

Reinette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ