[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <721c6735-dab9-49c7-bdb2-b34388144e21@intel.com>
Date: Fri, 3 Nov 2023 15:53:41 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Ilpo Järvinen <ilpo.jarvinen@...ux.intel.com>
CC: <linux-kselftest@...r.kernel.org>, Shuah Khan <shuah@...nel.org>,
Shaopeng Tan <tan.shaopeng@...fujitsu.com>,
Maciej Wieczór-Retman
<maciej.wieczor-retman@...el.com>,
Fenghua Yu <fenghua.yu@...el.com>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 24/24] selftests/resctrl: Ignore failures from L2 CAT test
with <= 2 bits
Hi Ilpo,
On 11/3/2023 3:24 AM, Ilpo Järvinen wrote:
> On Thu, 2 Nov 2023, Reinette Chatre wrote:
>> On 10/24/2023 2:26 AM, Ilpo Järvinen wrote:
>>> L2 CAT test with low number of bits tends to occasionally fail because
>>> of what seems random variation. The margin is quite small to begin with
>>> for <= 2 bits in CBM. At times, the result can even become negative.
>>> While it would be possible to allow negative values for those cases, it
>>> would be more confusing to user.
>>>
>>> Ignore failures from the tests where <= 2 were used to avoid false
>>> negative results.
>>>
>>
>> I think the core message is that 2 or fewer bits should not be used. Instead
>> of running the test and ignoring the results the test should perhaps just not
>> be run.
>
> I considered that but it often does work so it felt shame to now present
> them when they're successful. Then I just had to decide how to deal with
> the cases where they failed.
>
> Also, if I make it to not run down to 1 bit, those numbers will never ever
> be seen by anyone. It doesn't say 2 and 1 bit results don't contain any
> information to a human reader who is able to do more informed decisions
> whether something is truly working or not. We could, hypothetically, have
> a HW issue one day which makes 1-bit L2 mask to misbehave and if the
> number is never seen by anyone, it's extremely unlikely to be caught
> easily.
>
> They are just reliable enough for simple automated threshold currently.
> Maybe something else than average value would be, it would need to be
> explored but I suspect also the memory address of the buffer might affect
> the value, with L3 it definitely should because of how the things work but
> I don't know if that holds for L2 too. I have earlier tried playing with
> the buffer addresses with L3 but as I didn't immediately yield positive
> outcome to guard against outliers, I postponed that investigation (e.g.,
> my alloc pattern might have been too straightforward and didn't provide
> enough entropy into the buffer start address because I just alloc'ed n x
> buf_size buffers back-to-back).
>
> But I don't have very strong opinion on this so if you prefer I just stop
> at 3 bits, I can change it?
>
We seem to have different users in mind when thinking about this. I was
considering the users that just run the selftest to get a pass/fail. You
seem to also consider folks using this for validation. I'm ok with keeping
this change to accommodate both.
Reinette
Powered by blists - more mailing lists