linux-kernel - Re: [PATCH v19 14/20] x86/resctrl: Fill out rmid_read structure for smp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0e7dcfd4-a7a4-4205-b848-801540c28234@intel.com>
Date: Thu, 30 May 2024 13:23:59 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Tony Luck <tony.luck@...el.com>, Fenghua Yu <fenghua.yu@...el.com>, Maciej
 Wieczor-Retman <maciej.wieczor-retman@...el.com>, Peter Newman
	<peternewman@...gle.com>, James Morse <james.morse@....com>, Babu Moger
	<babu.moger@....com>, Drew Fustini <dfustini@...libre.com>, Dave Martin
	<Dave.Martin@....com>
CC: <x86@...nel.org>, <linux-kernel@...r.kernel.org>,
	<patches@...ts.linux.dev>
Subject: Re: [PATCH v19 14/20] x86/resctrl: Fill out rmid_read structure for
 smp_call*() to read a counter

Hi Tony,

On 5/28/24 3:19 PM, Tony Luck wrote:
> mon_event_read() fills out most fields of the struct rmid_read that is
> passed via an smp_call*() function to a CPU that is part of the correct
> domain to read the monitor counters.
> 
> The one exception is the sumdomains field that is set by the caller
> (either rdtgroup_mondata_show() or mon_add_all_files()).
> 
> When rmid_read.sumdomains is false:
> 	The domain field "d" specifies the only domain to read
> 	CPU to execute is chosen from d->hdr.cpu_mask
> 
> When rmid_read.sumdomains is true:
> 	The domain field is NULL.
> 	The cache_info field indicates that all domains
> 	that are part of that cache instance should be
> 	summed.
> 	CPU to execute is chosen from ci->shared_cpu_mask
> 
> Signed-off-by: Tony Luck <tony.luck@...el.com>
> ---
>   arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 28 ++++++++++++++++++-----
>   1 file changed, 22 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> index 3b9383612c35..4e394400e575 100644
> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> @@ -517,6 +517,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>   		    struct rdt_mon_domain *d, struct rdtgroup *rdtgrp,
>   		    int evtid, int first)
>   {
> +	cpumask_t *cpumask;
>   	int cpu;
>   
>   	/* When picking a CPU from cpu_mask, ensure it can't race with cpuhp */
> @@ -537,7 +538,8 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>   		return;
>   	}
>   
> -	cpu = cpumask_any_housekeeping(&d->hdr.cpu_mask, RESCTRL_PICK_ANY_CPU);
> +	cpumask = rr->sumdomains ? &rr->ci->shared_cpu_map : &d->hdr.cpu_mask;
> +	cpu = cpumask_any_housekeeping(cpumask, RESCTRL_PICK_ANY_CPU);
>   
>   	/*
>   	 * cpumask_any_housekeeping() prefers housekeeping CPUs, but
> @@ -546,7 +548,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>   	 * counters on some platforms if its called in IRQ context.
>   	 */
>   	if (tick_nohz_full_cpu(cpu))
> -		smp_call_function_any(&d->hdr.cpu_mask, mon_event_count, rr, 1);
> +		smp_call_function_any(cpumask, mon_event_count, rr, 1);
>   	else
>   		smp_call_on_cpu(cpu, smp_mon_event_count, rr, false);
>   

Why not provide the cpumask as parameter to mon_event_read() as I asked
about in previous version (again feedback that was totally ignored)? With
this implementation there is a portion of SNC logic in rdtgroup_mondata_show()
and another portion of logic in mon_event_read(). Scattering SNC quirk logic like
this makes this very hard to understand.

To help with understanding this code I asked _twice_  ([1] and [2]) to
_at_least_ provide comments for these SNC branches but even a request for comments
is totally ignored. I even provided some comments based on my understanding
that could have just been copied&pasted (if it was correct), but that was ignored
also. I understand this work has been taking a while and to support this I am
trying to spend more time to review and provide more detailed feedback but to
find it just to be ignored over and over is extremely frustrating and wasting
so much of my time. I do not expect that you do everything as I propose but if
I propose something silly then please point it out so that I can learn? At this
point I am convinced that you find my feedback not worth responding
to leaving us stuck with me who keep trying to review your work and getting
ignored over and over in every new version.

What should I do Tony? Why should I keep reviewing this work? Asking to address
review feedback should not be necessary yet I seem to keep doing it. An attempt
at an ultimatum was futile since it was just dodged [3] with burden placed right
back on me.

> @@ -575,15 +577,29 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg)
>   	resid = md.u.rid;
>   	domid = md.u.domid;
>   	evtid = md.u.evtid;
> -
> +	rr.sumdomains = md.u.sum;
>   	r = &rdt_resources_all[resid].r_resctrl;
> -	hdr = rdt_find_domain(&r->mon_domains, domid, NULL);
> -	if (!hdr || WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN)) {
> +
> +	if (rr.sumdomains) {

	/* Explain what this does and why */

> +		list_for_each_entry(d, &r->mon_domains, hdr.list) {
> +			if (d->ci->id == domid) {
> +				rr.ci = d->ci;
> +				d = NULL;
> +				goto got_cacheinfo;
> +			}
> +		}
>   		ret = -ENOENT;
>   		goto out;
> +	} else {
> +		hdr = rdt_find_domain(&r->mon_domains, domid, NULL);
> +		if (!hdr || WARN_ON_ONCE(hdr->type != RESCTRL_MON_DOMAIN)) {
> +			ret = -ENOENT;
> +			goto out;
> +		}
> +		d = container_of(hdr, struct rdt_mon_domain, hdr);
>   	}
> -	d = container_of(hdr, struct rdt_mon_domain, hdr);
>   
> +got_cacheinfo:

Something like "read_event" would be more appropriate since "got_cacheinfo"
has no relevance to non-SNC.

>   	mon_event_read(&rr, r, d, rdtgrp, evtid, false);
>   
>   	if (rr.err == -EIO)

Reinette

[1] https://lore.kernel.org/lkml/2a7cc72a-99fb-4862-b7eb-da3d515f0453@intel.com/
[2] https://lore.kernel.org/lkml/ccd5df99-4d7e-4224-a07e-3897e370b53e@intel.com/
[3] https://lore.kernel.org/lkml/41cc8a35-81fb-b890-f963-8dc9524a54b0@intel.com/