lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Wed, 17 Nov 2021 08:46:16 -0800
From:   Reinette Chatre <reinette.chatre@...el.com>
To:     "tan.shaopeng@...itsu.com" <tan.shaopeng@...itsu.com>,
        Fenghua Yu <fenghua.yu@...el.com>,
        Shuah Khan <shuah@...nel.org>
CC:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>
Subject: Re: [PATCH] selftests/resctrl: Skip MBM&CMT tests when Intel Sub-NUMA

Hi Shaopeng Tan,

On 11/14/2021 11:11 PM, tan.shaopeng@...itsu.com wrote:
> Hi Reinette,
> 
>> On 11/10/2021 12:27 AM, Shaopeng Tan wrote:
>>> From: "Tan, Shaopeng" <tan.shaopeng@...fujitsu.com>
>>>
>>> When the Intel Sub-NUMA Clustering(SNC) feature is enabled,
>>> the CMT and MBM counters may not be accurate.
>>> In this case, skip MBM&CMT tests.
>>>
>>> Signed-off-by: Shaopeng Tan <tan.shaopeng@...fujitsu.com>
>>> ---
>>> Hello,
>>>
>>> According to the Intel RDT reference Manual,
>>> when the sub-numa clustering feature is enabled, the CMT and MBM
>> counters may not be accurate.
>>> When running CMT tests and MBM tests on Intel processor, the result is "not
>> ok".
>>> So, fix it to skip the CMT & MBM test When the Intel Sub-NUMA
>> Clustering(SNC) feature is enabled.
>>>
>>
>> It is not clear to me which exact document you refer to but I did find a
>> RDT reference manual at the link below that describes the problem you
>> mention:
>> https://www.intel.com/content/dam/develop/external/us/en/documents/18
>> 0115-intel-rdtcascadelake-serverreferencemanual-806717.pdf
> 
> Yes, I referred this manual.
> 
>> What is not mentioned in your description is that this is a hardware
>> errata so the test is expected to fail on these systems and I find that
>> disabling the test for all systems based on this hardware errata is too
>> drastic.
> 
> Understood. It is not reasonable to disable the test for all systems
> based on this hardware errata.
> When I run restrl_test on Intel(R) Xeon(R) Gold 6254 CPU,
> the result of CMT & MBM is "not ok", and I took some time to debug it.
> In order to other people can do the test smoothly, I'd like to update the
> patch to disable the test only on 2nd Generation Intel Xeon scalable processors.

I've been thinking about this some more and I do not think that the test 
should be disabled. There is a clear incompatibility between SNC and RDT 
on these systems and I do not think the test should hide that, indeed it 
is helpful to highlight that there is an issue. Even so, spending time 
to debug a known issue is not a good use of time. Instead of skipping 
the test on these systems could the test perhaps be improved to provide 
more information on failure to help user decide if they really need SNC 
enabled? The test would show that RDT cannot be used on their system 
with the SNC configuration, hiding that information by skipping the test 
may create false idea that RDT is working with that configuration.

Reinette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ