linux-kernel - Re: [PATCH 5/5] selftests/resctrl: Reduce failures due to outliers in MBA/MBM tests

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <cf7439c4-f72c-a145-5a65-84ae15c5d96f@intel.com>
Date:   Tue, 12 Sep 2023 15:10:53 -0700
From:   Reinette Chatre <reinette.chatre@...el.com>
To:     Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>,
        Shuah Khan <shuah@...nel.org>,
        Shuah Khan <skhan@...uxfoundation.org>,
        <linux-kselftest@...r.kernel.org>,
        Maciej Wieczór-Retman 
        <maciej.wieczor-retman@...el.com>
CC:     <linux-kernel@...r.kernel.org>,
        Shaopeng Tan <tan.shaopeng@...fujitsu.com>,
        <stable@...r.kernel.org>
Subject: Re: [PATCH 5/5] selftests/resctrl: Reduce failures due to outliers in
 MBA/MBM tests

Hi Ilpo,

On 9/11/2023 4:19 AM, Ilpo Järvinen wrote:
> 5% difference upper bound for success is a bit on the low side for the

"a bit on the low side" is very vague.

> MBA and MBM tests. Some platforms produce outliers that are slightly
> above that, typically 6-7%.
> 
> Relaxing the MBA/MBM success bound to 8% removes most of the failures
> due those frequent outliers.

This description needs more context on what issue is being solved here.
What does the % difference represent? How was new percentage determined?

Did you investigate why there are differences between platforms? From
what I understand these tests measure memory bandwidth using perf and
resctrl and then compare the difference. Are there interesting things
about the platforms on which the difference is higher than 5%? Could
those be systems with multiple sockets (and thus multiple PMUs that need
to be setup, reset, and read)? Can the reading of the counters be improved
instead of relaxing the success criteria? A quick comparison between
get_mem_bw_imc() and get_mem_bw_resctrl() makes me think that a difference
is not surprising ... note how the PMU counters are started and reset
(potentially on multiple sockets) at every iteration while the resctrl
counters keep rolling and new values are just subtracted from previous.

Reinette