lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ae43b9c3-15f9-417d-aad3-ee48987c9b90@intel.com>
Date: Thu, 20 Jun 2024 14:31:51 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Tony Luck <tony.luck@...el.com>, Fenghua Yu <fenghua.yu@...el.com>, Maciej
 Wieczor-Retman <maciej.wieczor-retman@...el.com>, Peter Newman
	<peternewman@...gle.com>, James Morse <james.morse@....com>, Babu Moger
	<babu.moger@....com>, Drew Fustini <dfustini@...libre.com>, Dave Martin
	<Dave.Martin@....com>
CC: <x86@...nel.org>, <linux-kernel@...r.kernel.org>,
	<patches@...ts.linux.dev>
Subject: Re: [PATCH v20 15/18] x86/resctrl: Make __mon_event_count() handle
 sum domains

Hi Tony,

On 6/10/24 11:35 AM, Tony Luck wrote:
> Legacy resctrl monitor files must provide the sum of event values across
> all Sub-NUMA Cluster (SNC) domains that share an L3 cache instance.
> 
> There are now two cases:
> 1) A specific domain is provided in struct rmid_read
>     This is either a non-SNC system, or the request is to read data
>     from just one SNC node.
> 2) Domain pointer is NULL. In this case the cacheinfo field in struct
>     rmid_read indicates that all SNC nodes that share that L3 cache
>     instance should have the event read and return the sum of all
>     values.
> 
> Update the CPU sanity check. The existing check that an event is read
> from a CPU in the requested domain still applies when reading a single
> domain. But when summing across domains a more relaxed check that the
> current CPU is in the scope of the L3 cache instance is appropriate
> since the MSRs to read events are scoped at L3 cache level.
> 
> Signed-off-by: Tony Luck <tony.luck@...el.com>
> ---
>   arch/x86/kernel/cpu/resctrl/monitor.c | 40 +++++++++++++++++++++------
>   1 file changed, 32 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index f2fd35d294f2..c4d9a8df8d2d 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -324,9 +324,6 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
>   
>   	resctrl_arch_rmid_read_context_check();
>   
> -	if (!cpumask_test_cpu(smp_processor_id(), &d->hdr.cpu_mask))
> -		return -EINVAL;
> -
>   	prmid = logical_rmid_to_physical_rmid(cpu, rmid);
>   	ret = __rmid_read_phys(prmid, eventid, &msr_val);
>   	if (ret)
> @@ -592,6 +589,8 @@ static struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
>   
>   static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
>   {
> +	int cpu = smp_processor_id();
> +	struct rdt_mon_domain *d;
>   	struct mbm_state *m;
>   	u64 tval = 0;
>   
> @@ -603,12 +602,37 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
>   		return 0;
>   	}
>   
> -	rr->err = resctrl_arch_rmid_read(rr->r, rr->d, closid, rmid, rr->evtid,
> -					 &tval, rr->arch_mon_ctx);
> -	if (rr->err)
> -		return rr->err;
> +	if (rr->d) {
> +		/* Reading a single domain, must be on a CPU in that domain */

(nit: Please let this sentence as well as the later comment end with period.)

> +		if (!cpumask_test_cpu(cpu, &rr->d->hdr.cpu_mask))
> +			return -EINVAL;
> +		rr->err = resctrl_arch_rmid_read(rr->r, rr->d, closid, rmid,
> +						 rr->evtid, &tval, rr->arch_mon_ctx);
> +		if (rr->err)
> +			return rr->err;
> +
> +		rr->val += tval;
> +
> +		return 0;
> +	}
>   
> -	rr->val += tval;
> +	/* Summing domains that share a cache, must be on a CPU for that cache */
> +	if (!cpumask_test_cpu(cpu, &rr->ci->shared_cpu_map))
> +		return -EINVAL;
> +
> +	/*
> +	 * Legacy files must report the sum of an event across all
> +	 * domains that share the same L3 cache instance.
> +	 */
> +	list_for_each_entry(d, &rr->r->mon_domains, hdr.list) {
> +		if (d->ci->id != rr->ci->id)
> +			continue;
> +		rr->err = resctrl_arch_rmid_read(rr->r, d, closid, rmid,
> +						 rr->evtid, &tval, rr->arch_mon_ctx);
> +		if (rr->err)
> +			return rr->err;

An error here means that the hardware does not have data available to return. Should
the unavailability of data for one domain be considered an error for all? Note how
this is handled in mon_event_count() where the error is discarded if any of the monitor
event reads succeeded ... should this be done here also?

> +		rr->val += tval;
> +	}
>   
>   	return 0;
>   }


Reinette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ