linux-kernel - Re: [PATCH] fs/resctrl,x86/resctrl: Factor mba rounding to be per-arch

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1493d341-16a5-47e9-a834-cd8133b91fed@intel.com>
Date: Thu, 16 Oct 2025 08:57:59 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Dave Martin <Dave.Martin@....com>
CC: <linux-kernel@...r.kernel.org>, Tony Luck <tony.luck@...el.com>, "James
 Morse" <james.morse@....com>, Thomas Gleixner <tglx@...utronix.de>, "Ingo
 Molnar" <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, Dave Hansen
	<dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>, "Jonathan
 Corbet" <corbet@....net>, <x86@...nel.org>, <linux-doc@...r.kernel.org>
Subject: Re: [PATCH] fs/resctrl,x86/resctrl: Factor mba rounding to be
 per-arch

Hi Dave,

On 10/15/25 8:18 AM, Dave Martin wrote:
> Hi Reinette,
> 
> Just following up on the skipped L2_NONCONT_CAT test -- see below.

Thank you very much.

> 
> [...]
> 
> On Mon, Sep 22, 2025 at 03:39:47PM +0100, Dave Martin wrote:
> 
> [...]
> 
>> On Fri, Sep 12, 2025 at 03:19:04PM -0700, Reinette Chatre wrote:
> 
> [...]
> 
>>> On 9/2/25 9:24 AM, Dave Martin wrote:
> 
> [...]
> 
>>>> Testing: the resctrl MBA and MBM tests pass on a random x86 machine (+
>>>> the other tests except for the NONCONT_CAT tests, which do not seem to
>>>> be supported in my configuration -- and have nothing to do with the
>>>> code touched by this patch).
>>>
>>> Is the NONCONT_CAT test failing (i.e printing "not ok")?
>>>
>>> The NONCONT_CAT tests may print error messages as debug information as part of
>>> running, but these errors are expected as part of the test. The test should accurately
>>> state whether it passed or failed though. For example, below attempts to write
>>> a non-contiguous CBM to a system that does not support non-contiguous masks.
>>> This fails as expected, error messages printed as debugging and thus the test passes
>>> with an "ok".
>>>
>>> # Write schema "L3:0=ff0ff" to resctrl FS # write() failed : Invalid argument                                      
>>> # Non-contiguous CBMs not supported and write of non-contiguous CBM failed as expected                             
>>> ok 5 L3_NONCONT_CAT: test                             
>>
>> I don't think that this was anything to do with my changes, but I don't
>> still seem to have the test output.  (Since this test has to do with
>> bitmap schemata (?), it seemed unlikely to be affected by changes to
>> bw_validate().)
>>
>> I'll need to re-test with and without this patch to check whether it
>> makes any difference.
> 
> I finally got around to testing this on top of -rc1.
> 
> Disregarding trivial differences, the patched version (+++) doesn't
> seem to introduce any regressions over the vanilla version (---)
> (below).  (The CMT test actually failed with an out-of-tolerance result
> on the vanilla kernel only.  Possibly there was some adverse system
> load interfering.)

My first thought is that this is another unfortunate consequence of the resctrl
performance-as-functional tests.
The percentage difference you encountered is quite large and that
prompted me to take a closer look and it does look to me as though the CMT
can be improved. (Whether we should spend more effort on these performance tests
instead of creating new deterministic functional tests is another topic.) 

> 
> 
> Looking at the code, it seems that L2_NONCONT_CAT is not gated by any
> config or mount option.  I think this is just a feature that my
> hardware doesn't support (?)

Yes, this is how I also interpret the test output.

Focusing on the CMT test ...

>  # Starting CMT test ...
>  # Mounting resctrl to "/sys/fs/resctrl"
>  # Cache size :23068672
>  # Writing benchmark parameters to resctrl FS
> -# Benchmark PID: 5135
> +# Benchmark PID: 4970
>  # Checking for pass/fail
> -# Fail: Check cache miss rate within 15%
> -# Percent diff=24
> +# Pass: Check cache miss rate within 15%
> +# Percent diff=4
>  # Number of bits: 5
> -# Average LLC val: 7942963
> +# Average LLC val: 10918297
>  # Cache span (bytes): 10485760
> -not ok 3 CMT: test
> +ok 3 CMT: test

A 24% difference followed by a 4% difference is a big swing. On a high level
the CMT test creates a new resource group with only the test assigned to it. The test
initializes and accesses a buffer a couple of time while measuring cache occupancy.
"success" is when the cache occupancy is within 15% of the buffer size.

I noticed a couple of places where the test is susceptible to interference and
system architecture.
1) The cache allocation of test's resource group overlaps with the rest of the
   system. On a busy system it is thus likely that the test's cache entries may be
   evicted.
2) The test does not account for cache architecture where, for example, there may be
   an L2 cache that can accommodate a large part of the buffer and thus not be
   reflected in the LLC occupancy count.

I started experimenting to see what it will take to reduce interference and ended up
with a change like below that isolates the cache portions between the test and the
rest of the system and if L2 cache allocation is possible, reduces the amount of L2
cache the test can allocate into as much as possible. This opened up another tangent
where the size of cache portion is the same as the buffer while it is not realistic
to expect a user space buffer to fill into the cache so nicely. 

Even with these changes I was not able to get the percentages to drop significantly
on my system but it may help to reduce the swings in numbers observed.

But, I do not see how work like this helps to improve resctrl health (compared to,
for example, just increasing the "success" percentage).

diff --git a/tools/testing/selftests/resctrl/cmt_test.c b/tools/testing/selftests/resctrl/cmt_test.c
index d09e693dc739..494e98aa8b69 100644
--- a/tools/testing/selftests/resctrl/cmt_test.c
+++ b/tools/testing/selftests/resctrl/cmt_test.c
@@ -19,12 +19,22 @@
 #define CON_MON_LCC_OCCUP_PATH		\
 	"%s/%s/mon_data/mon_L3_%02d/llc_occupancy"
 
-static int cmt_init(const struct resctrl_val_param *param, int domain_id)
+static int cmt_init(const struct resctrl_test *test,
+		    const struct user_params *uparams,
+		    const struct resctrl_val_param *param, int domain_id)
 {
+	char schemata[64];
+	int ret;
+
 	sprintf(llc_occup_path, CON_MON_LCC_OCCUP_PATH, RESCTRL_PATH,
 		param->ctrlgrp, domain_id);
 
-	return 0;
+	snprintf(schemata, sizeof(schemata), "%lx", param->mask);
+	ret = write_schemata(param->ctrlgrp, schemata, uparams->cpu, test->resource);
+	if (!ret && !strcmp(test->resource, "L3") && resctrl_resource_exists("L2"))
+		ret = write_schemata(param->ctrlgrp, "0x1", uparams->cpu, "L2");
+
+	return ret;
 }
 
 static int cmt_setup(const struct resctrl_test *test,
@@ -119,6 +129,7 @@ static int cmt_run_test(const struct resctrl_test *test, const struct user_param
 	unsigned long cache_total_size = 0;
 	int n = uparams->bits ? : 5;
 	unsigned long long_mask;
+	char schemata[64];
 	int count_of_bits;
 	size_t span;
 	int ret;
@@ -162,6 +173,11 @@ static int cmt_run_test(const struct resctrl_test *test, const struct user_param
 		param.fill_buf = &fill_buf;
 	}
 
+	snprintf(schemata, sizeof(schemata), "%lx", ~param.mask & long_mask);
+	ret = write_schemata("", schemata, uparams->cpu, test->resource);
+	if (ret)
+		return ret;
+
 	remove(RESULT_FILE_NAME);
 
 	ret = resctrl_val(test, uparams, &param);
diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c
index c7e9adc0368f..cd4c715b7ffd 100644
--- a/tools/testing/selftests/resctrl/mba_test.c
+++ b/tools/testing/selftests/resctrl/mba_test.c
@@ -17,7 +17,9 @@
 #define ALLOCATION_MIN		10
 #define ALLOCATION_STEP		10
 
-static int mba_init(const struct resctrl_val_param *param, int domain_id)
+static int mba_init(const struct resctrl_test *test,
+		    const struct user_params *uparams,
+		    const struct resctrl_val_param *param, int domain_id)
 {
 	int ret;
 
diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c
index 84d8bc250539..58201f844740 100644
--- a/tools/testing/selftests/resctrl/mbm_test.c
+++ b/tools/testing/selftests/resctrl/mbm_test.c
@@ -83,7 +83,9 @@ static int check_results(size_t span)
 	return ret;
 }
 
-static int mbm_init(const struct resctrl_val_param *param, int domain_id)
+static int mbm_init(const struct resctrl_test *test,
+		    const struct user_params *uparams,
+		    const struct resctrl_val_param *param, int domain_id)
 {
 	int ret;
 
diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h
index cd3adfc14969..9853bd746392 100644
--- a/tools/testing/selftests/resctrl/resctrl.h
+++ b/tools/testing/selftests/resctrl/resctrl.h
@@ -133,7 +133,9 @@ struct resctrl_val_param {
 	char			filename[64];
 	unsigned long		mask;
 	int			num_of_runs;
-	int			(*init)(const struct resctrl_val_param *param,
+	int			(*init)(const struct resctrl_test *test,
+					const struct user_params *uparams,
+					const struct resctrl_val_param *param,
 					int domain_id);
 	int			(*setup)(const struct resctrl_test *test,
 					 const struct user_params *uparams,
diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c
index 7c08e936572d..a5a8badb83d4 100644
--- a/tools/testing/selftests/resctrl/resctrl_val.c
+++ b/tools/testing/selftests/resctrl/resctrl_val.c
@@ -569,7 +569,7 @@ int resctrl_val(const struct resctrl_test *test,
 		goto reset_affinity;
 
 	if (param->init) {
-		ret = param->init(param, domain_id);
+		ret = param->init(test, uparams, param, domain_id);
 		if (ret)
 			goto reset_affinity;
 	}