lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b7578f5e-bc60-43d3-90c2-491e203a937c@intel.com>
Date: Tue, 5 Nov 2024 09:08:52 -0800
From: Reinette Chatre <reinette.chatre@...el.com>
To: Maciej Wieczor-Retman <maciej.wieczor-retman@...el.com>,
	<fenghua.yu@...el.com>, <shuah@...nel.org>
CC: <linux-kernel@...r.kernel.org>, <linux-kselftest@...r.kernel.org>,
	<ilpo.jarvinen@...ux.intel.com>, <tony.luck@...el.com>
Subject: Re: [PATCH v5 2/2] selftests/resctrl: Adjust SNC support messages

Hi Maciej,

On 10/29/24 6:00 AM, Maciej Wieczor-Retman wrote:
> Resctrl selftest prints a message on test failure that Sub-Numa
> Clustering (SNC) could be enabled and points the user to check their BIOS
> settings. No actual check is performed before printing that message so
> it is not very accurate in pinpointing a problem.
> 
> Figuring out if SNC is enabled is only one part of the problem, the
> others being whether the detected SNC mode is reliable and whether the
> kernel supports SNC in resctrl.

Starting to sound like previous patch ...

> 
> When there is SNC support for kernel's resctrl subsystem and SNC is
> enabled then sub node files are created for each node in the resctrlfs.
> The sub node files exist in each regular node's L3 monitoring directory.
> The reliable path to check for existence of sub node files is
> /sys/fs/resctrl/mon_data/mon_L3_00/mon_sub_L3_00.

... this is about previous patch ...

> 
> To check if SNC detection is reliable one can check the
> /sys/devices/system/cpu/offline file. If it's empty, it means all cores
> are operational and the ratio should be calculated correctly. If it has
> any contents, it means the detected SNC mode can't be trusted and should
> be disabled.

... also about previous patch ...

> 
> Add helpers for all operations mentioned above.

Not done in this patch.

> 
> Detect SNC mode once and let other tests inherit that information.

Not done in this patch.

> 
> Add messages to alert the user when SNC detection could return incorrect
> results. Correct old messages to account for kernel support of SNC in
> resctrl.

Sounds like description of what earlier version of this patch did ...

> 
> Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman@...el.com>
> ---
> Changelog v5:
> - Move all resctrlfs.c code from this patch to 1/2. (Reinette)

Please update the changelog to match the changes made to the patch.

> - Remove kernel support check and error message from CAT since it can't
>   be happen.
> - Remove snc checks in CAT since snc doesn't affect it here.
> - Skip MBM, MBA and CMT tests if snc is unreliable.
> 
> Changelog v4:
> - Change messages at the end of tests and at the start of
>   run_single_test. (Reinette)
> - Add messages at the end of CAT since it can also fail due to enabled
>   SNC + lack of kernel support.
> - Remove snc_mode global variable. (Reinette)
> - Fix wrong description of snc_kernel_support(). (Reinette)
> - Move call to cpus_offline_empty() into snc_nodes_per_l3_cache() so the
>   whole detection flow is in one place as discussed. (Reinette)
> 
> Changelog v3:
> - Change snc_ways() to snc_nodes_per_l3_cache(). (Reinette)
> - Add printing the discovered SNC mode. (Reinette)
> - Change method of kernel support discovery from cache sizes to
>   existance of sub node files.
> - Check if SNC detection is unreliable.
> - Move SNC detection to only the first run_single_test() instead on
>   error at the end of test runs.
> - Add global value to remind user at the end of relevant tests if SNC
>   detection was found to be unreliable.
> - Redo the patch message after the changes.
> 
> Changelog v2:
> - Move snc_ways() checks from individual tests into
>   snc_kernel_support().
> - Write better comment for snc_kernel_support().
> 
>  tools/testing/selftests/resctrl/cmt_test.c |  8 ++++++--
>  tools/testing/selftests/resctrl/mba_test.c |  8 +++++++-
>  tools/testing/selftests/resctrl/mbm_test.c | 10 +++++++---
>  tools/testing/selftests/resctrl/resctrl.h  |  3 +++
>  4 files changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/testing/selftests/resctrl/cmt_test.c b/tools/testing/selftests/resctrl/cmt_test.c
> index 0c045080d808..1470bd64d158 100644
> --- a/tools/testing/selftests/resctrl/cmt_test.c
> +++ b/tools/testing/selftests/resctrl/cmt_test.c
> @@ -133,6 +133,10 @@ static int cmt_run_test(const struct resctrl_test *test, const struct user_param
>  	ret = get_cache_size(uparams->cpu, "L3", &cache_total_size);
>  	if (ret)
>  		return ret;
> +
> +	if ((get_vendor() == ARCH_INTEL) && snc_unreliable)
> +		ksft_exit_skip("Sub-NUMA Clustering could not be detected properly. Skipping...\n");
> +

This (inconsistent) duplication can be moved to run_single_test().

>  	ksft_print_msg("Cache size :%lu\n", cache_total_size);
>  
>  	count_of_bits = count_bits(long_mask);
> @@ -175,8 +179,8 @@ static int cmt_run_test(const struct resctrl_test *test, const struct user_param
>  		goto out;
>  
>  	ret = check_results(&param, span, n);
> -	if (ret && (get_vendor() == ARCH_INTEL))
> -		ksft_print_msg("Intel CMT may be inaccurate when Sub-NUMA Clustering is enabled. Check BIOS configuration.\n");
> +	if (ret && (get_vendor() == ARCH_INTEL) && !snc_kernel_support())
> +		ksft_print_msg("Kernel doesn't support Sub-NUMA Clustering but it is enabled on the system.\n");
>  
>  out:
>  	free(span_str);
> diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c
> index ab8496a4925b..8f4e198da047 100644
> --- a/tools/testing/selftests/resctrl/mba_test.c
> +++ b/tools/testing/selftests/resctrl/mba_test.c
> @@ -170,15 +170,21 @@ static int mba_run_test(const struct resctrl_test *test, const struct user_param
>  		.setup		= mba_setup,
>  		.measure	= mba_measure,
>  	};
> -	int ret;
> +	int ret, snc_support;
>  
>  	remove(RESULT_FILE_NAME);
>  
> +	snc_support = snc_kernel_support();
> +	if ((get_vendor() == ARCH_INTEL) && snc_unreliable)
> +		ksft_exit_skip("Sub-NUMA Clustering could not be detected properly. Skipping...\n");
> +
>  	ret = resctrl_val(test, uparams, uparams->benchmark_cmd, &param);
>  	if (ret)
>  		return ret;
>  
>  	ret = check_results();
> +	if (ret && (get_vendor() == ARCH_INTEL) && !snc_support)
> +		ksft_print_msg("Kernel doesn't support Sub-NUMA Clustering but it is enabled on the system.\n");
>  
>  	return ret;
>  }
> diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c
> index 6b5a3b52d861..a68f70589b91 100644
> --- a/tools/testing/selftests/resctrl/mbm_test.c
> +++ b/tools/testing/selftests/resctrl/mbm_test.c
> @@ -138,17 +138,21 @@ static int mbm_run_test(const struct resctrl_test *test, const struct user_param
>  		.setup		= mbm_setup,
>  		.measure	= mbm_measure,
>  	};
> -	int ret;
> +	int ret, snc_support;
>  
>  	remove(RESULT_FILE_NAME);
>  
> +	snc_support = snc_kernel_support();
> +	if ((get_vendor() == ARCH_INTEL) && snc_unreliable)
> +		ksft_exit_skip("Sub-NUMA Clustering could not be detected properly. Skipping...\n");
> +
>  	ret = resctrl_val(test, uparams, uparams->benchmark_cmd, &param);
>  	if (ret)
>  		return ret;
>  
>  	ret = check_results(DEFAULT_SPAN);
> -	if (ret && (get_vendor() == ARCH_INTEL))
> -		ksft_print_msg("Intel MBM may be inaccurate when Sub-NUMA Clustering is enabled. Check BIOS configuration.\n");
> +	if (ret && (get_vendor() == ARCH_INTEL) && !snc_support)
> +		ksft_print_msg("Kernel doesn't support Sub-NUMA Clustering but it is enabled on the system.\n");
>  
>  	return ret;
>  }
> diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h
> index 851b37c9c38a..488bdca01e4f 100644
> --- a/tools/testing/selftests/resctrl/resctrl.h
> +++ b/tools/testing/selftests/resctrl/resctrl.h
> @@ -121,6 +121,8 @@ struct perf_event_read {
>   */
>  extern volatile int *value_sink;
>  
> +extern int snc_unreliable;
> +
>  extern char llc_occup_path[1024];
>  
>  int snc_nodes_per_l3_cache(void);
> @@ -167,6 +169,7 @@ void ctrlc_handler(int signum, siginfo_t *info, void *ptr);
>  int signal_handler_register(const struct resctrl_test *test);
>  void signal_handler_unregister(void);
>  unsigned int count_bits(unsigned long n);
> +int snc_kernel_support(void);
>  
>  void perf_event_attr_initialize(struct perf_event_attr *pea, __u64 config);
>  void perf_event_initialize_read_format(struct perf_event_read *pe_read);

Reinette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ