linux-kernel - Re: [PATCH 24/24] selftests/resctrl: Ignore failures from L2 CAT test with <= 2 bits

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <721c6735-dab9-49c7-bdb2-b34388144e21@intel.com>
Date:   Fri, 3 Nov 2023 15:53:41 -0700
From:   Reinette Chatre <reinette.chatre@...el.com>
To:     Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
CC:     <linux-kselftest@...r.kernel.org>, Shuah Khan <shuah@...nel.org>,
        Shaopeng Tan <tan.shaopeng@...fujitsu.com>,
        Maciej Wieczór-Retman 
        <maciej.wieczor-retman@...el.com>,
        Fenghua Yu <fenghua.yu@...el.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 24/24] selftests/resctrl: Ignore failures from L2 CAT test
 with <= 2 bits

Hi Ilpo,

On 11/3/2023 3:24 AM, Ilpo Järvinen wrote:
> On Thu, 2 Nov 2023, Reinette Chatre wrote:
>> On 10/24/2023 2:26 AM, Ilpo Järvinen wrote:
>>> L2 CAT test with low number of bits tends to occasionally fail because
>>> of what seems random variation. The margin is quite small to begin with
>>> for <= 2 bits in CBM. At times, the result can even become negative.
>>> While it would be possible to allow negative values for those cases, it
>>> would be more confusing to user.
>>>
>>> Ignore failures from the tests where <= 2 were used to avoid false
>>> negative results.
>>>
>>
>> I think the core message is that 2 or fewer bits should not be used. Instead
>> of running the test and ignoring the results the test should perhaps just not
>> be run.
> 
> I considered that but it often does work so it felt shame to now present
> them when they're successful. Then I just had to decide how to deal with
> the cases where they failed.
> 
> Also, if I make it to not run down to 1 bit, those numbers will never ever 
> be seen by anyone. It doesn't say 2 and 1 bit results don't contain any 
> information to a human reader who is able to do more informed decisions 
> whether something is truly working or not. We could, hypothetically, have 
> a HW issue one day which makes 1-bit L2 mask to misbehave and if the 
> number is never seen by anyone, it's extremely unlikely to be caught 
> easily.
> 
> They are just reliable enough for simple automated threshold currently. 
> Maybe something else than average value would be, it would need to be 
> explored but I suspect also the memory address of the buffer might affect 
> the value, with L3 it definitely should because of how the things work but 
> I don't know if that holds for L2 too. I have earlier tried playing with 
> the buffer addresses with L3 but as I didn't immediately yield positive 
> outcome to guard against outliers, I postponed that investigation (e.g., 
> my alloc pattern might have been too straightforward and didn't provide 
> enough entropy into the buffer start address because I just alloc'ed n x 
> buf_size buffers back-to-back).
> 
> But I don't have very strong opinion on this so if you prefer I just stop 
> at 3 bits, I can change it?
> 

We seem to have different users in mind when thinking about this. I was
considering the users that just run the selftest to get a pass/fail. You
seem to also consider folks using this for validation. I'm ok with keeping
this change to accommodate both.

Reinette