[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d7d10322-644c-465a-b0f3-7d3afcb78217@intel.com>
Date: Tue, 25 Jun 2024 16:35:06 -0700
From: Reinette Chatre <reinette.chatre@...el.com>
To: Tony Luck <tony.luck@...el.com>, Fenghua Yu <fenghua.yu@...el.com>, Maciej
Wieczor-Retman <maciej.wieczor-retman@...el.com>, Peter Newman
<peternewman@...gle.com>, James Morse <james.morse@....com>, Babu Moger
<babu.moger@....com>, Drew Fustini <dfustini@...libre.com>, Dave Martin
<Dave.Martin@....com>
CC: <x86@...nel.org>, <linux-kernel@...r.kernel.org>,
<patches@...ts.linux.dev>
Subject: Re: [PATCH v21 15/18] x86/resctrl: Make __mon_event_count() handle
sum domains
Hi Tony,
On 6/21/24 3:38 PM, Tony Luck wrote:
> Legacy resctrl monitor files must provide the sum of event values across
> all Sub-NUMA Cluster (SNC) domains that share an L3 cache instance.
>
> There are now two cases:
> 1) A specific domain is provided in struct rmid_read
> This is either a non-SNC system, or the request is to read data
> from just one SNC node.
> 2) Domain pointer is NULL. In this case the cacheinfo field in struct
> rmid_read indicates that all SNC nodes that share that L3 cache
> instance should have the event read and return the sum of all
> values.
>
> Update the CPU sanity check. The existing check that an event is read
> from a CPU in the requested domain still applies when reading a single
> domain. But when summing across domains a more relaxed check that the
> current CPU is in the scope of the L3 cache instance is appropriate
> since the MSRs to read events are scoped at L3 cache level.
>
> Signed-off-by: Tony Luck <tony.luck@...el.com>
> ---
> arch/x86/kernel/cpu/resctrl/monitor.c | 50 ++++++++++++++++++++++-----
> 1 file changed, 41 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 877d898e8fd0..6812560bee3c 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -324,9 +324,6 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
>
> resctrl_arch_rmid_read_context_check();
>
> - if (!cpumask_test_cpu(smp_processor_id(), &d->hdr.cpu_mask))
> - return -EINVAL;
> -
> prmid = logical_rmid_to_physical_rmid(cpu, rmid);
> ret = __rmid_read_phys(prmid, eventid, &msr_val);
> if (ret)
> @@ -592,7 +589,10 @@ static struct mbm_state *get_mbm_state(struct rdt_mon_domain *d, u32 closid,
>
> static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
> {
> + int cpu = smp_processor_id();
> + struct rdt_mon_domain *d;
> struct mbm_state *m;
> + int err, ret;
> u64 tval = 0;
>
> if (rr->first) {
> @@ -603,14 +603,46 @@ static int __mon_event_count(u32 closid, u32 rmid, struct rmid_read *rr)
> return 0;
> }
>
> - rr->err = resctrl_arch_rmid_read(rr->r, rr->d, closid, rmid, rr->evtid,
> - &tval, rr->arch_mon_ctx);
> - if (rr->err)
> - return rr->err;
> + if (rr->d) {
> + /* Reading a single domain, must be on a CPU in that domain. */
> + if (!cpumask_test_cpu(cpu, &rr->d->hdr.cpu_mask))
> + return -EINVAL;
> + rr->err = resctrl_arch_rmid_read(rr->r, rr->d, closid, rmid,
> + rr->evtid, &tval, rr->arch_mon_ctx);
> + if (rr->err)
> + return rr->err;
>
> - rr->val += tval;
> + rr->val += tval;
>
> - return 0;
> + return 0;
> + }
> +
> + /* Summing domains that share a cache, must be on a CPU for that cache. */
> + if (!cpumask_test_cpu(cpu, &rr->ci->shared_cpu_map))
> + return -EINVAL;
> +
> + /*
> + * Legacy files must report the sum of an event across all
> + * domains that share the same L3 cache instance. But newly
> + * created domains with no traffic may report -EINVAL/Unavailable.
> + * Report success if a read from any domain succeeds.
> + */
The snippet of code you copied the comment from actually kept the
original error instead of overriding it to be -EINVAL as is done here.
The code may be ok, since a sum domain may be specified to report
"Unavailable" in scenario where the domain it is trying to include in
sum returns "Error". It may be simplest to just drop the
"But newly created ... " sentence and have last sentence be:
Report success if a read from any domain succeeds, -EINVAL
(translated to "Unavailable" for user space) if reading from
all domains fail for any reason.
> + ret = -EINVAL;
> + list_for_each_entry(d, &rr->r->mon_domains, hdr.list) {
> + if (d->ci->id != rr->ci->id)
> + continue;
> + err = resctrl_arch_rmid_read(rr->r, d, closid, rmid,
> + rr->evtid, &tval, rr->arch_mon_ctx);
> + if (!err) {
> + rr->val += tval;
> + ret = 0;
> + }
> + }
> +
> + if (ret)
> + rr->err = ret;
> +
> + return ret;
> }
>
> /*
Reinette
Powered by blists - more mailing lists